Nucleotide Sequence of Escherichia coli Pathogenicity Islands 
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1 0002] This invention was made with United States government support awarded by 
the following agencies: 

NIH Grant # AI20323; AI25547. 
The United States has certain rights to this invention. 

Background of the Invention 
Field of the Invention 

[0003] The present invention relates to novel genes located in two chromosomal 
regions within E. coli that are associated with virulence. These chromosomal regions are 
known as pathogenicity islands (PAIs). 

Related Background Art 

[0004] Escherichia coli (E. coli) is a normal inhabitant of the intestine of humans and 
various animals. Pathogenic E. coli strains are able to cause infections of the intestine 
(intestinal E, coli strains) and of other organs such as the urinary tract (uropathogenic E. 
coli) or the brain (extraintestinal E. coli). Intestinal pathogenic E. coli are a well 
established and leading cause of severe infantile diarrhea in the developing world. 
Additionally, cases of newborn meningitis and sepsis have been attributed to E, coli 
pathogens. 

[0005] In contrast to non-pathogenic isolates, pathogenic E. coli produce pathogenicity 
factors which contribute to the ability of strains to cause infectious diseases (Miihldorfer, 
1. and Hacker, J., Microb. Pathogen. 16:171-181 1994). Adhesions facilitate binding of 
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pathogenic bacteria to host tissues. Pathogenic E, coli strains also express toxins including 
haemolysins, which are involved in the destruction of host cells, and surface structures 
such as 0-antigens, capsules or membrane proteins, which protect the bacteria from the 
action of phagocytes or the complement system (Ritter, et al, Mol Microbiol 17:109-212 
1995). 

[0006] The genes coding for pathogenicity factors of intestinal E. coli are located on 
large plasmids, phage genomes or on the chromosome. In contrast to intestinal E, coli, 
pathogenicity determinants of uropathogenic and other extraintestinal E. coli are, in most 
cases, located on the chromosome. Id 

[0007] Large chromosomal regions in pathogenic bacteria that encode adjacently 
located virulence genes have been termed pathogenicity islands ("PAIs"). PAIs are 
indicative of large fragments of DNA which comprise a group of virulence genes behaving 
as a distinct molecular and functional unit much like an island within the bacterial 
chromosome. For example, intact PAIs appear to transfer between organisms and confer 
complex virulence properties to the recipient bacteria. 

[0008] Chromosomal PAIs in bacterial cells have been described in increasing detail 
over recent years. For example, J. Hacker and co-workers described two large, unstable 
regions in the chromosome of uropathogenic Escherichia coli strain 536 as PAI-I and PAI- 
II (Hacker J., et al, Microbiol Pathog. 8:213-25 1990). Hacker found that PAI-I and 
PAI-II containing virulence regions can be lost by spontaneous deletion due to 
recombination events. Both of these PAIs were found to encode multiple virulence genes, 
and their loss resulted in reduced hemolytic activity, serum resistance, mannose-resistant 
hemagglutination, uroepithelial cell binding, and mouse virulence of the E. coll (Knapp, S 
et al^ J. Bacteriol 168:22-30 1986). Therefore, pathogenicity islands are characterized by 
their ability to confer complex virulence phenotypes to bacterial cells. 
[0009] In addition to E. coli, specific deletion of large virulence regions has been 
observed in other bacteria such as Yersinia pestis. For example, Fetherston and co- 
workers found that a 102-kb region of the Y. pestis chromosome lost by spontaneous 
deletion resulted in the loss of many Y, pestis virulence phenotypes. (Fetherston, J.D. and 
Perry, R.D., Mol Microbiol 13:697-708 1994, Fetherston, et al, Mol Microbiol 6:2693- 
704 1992). In this instance, the deletion appeared to be due to recombination within 2.2- 
kb repetitive elements at both ends of the 102-kb region. 

[0010] It is possible that deletion of PAIs may benefit the organism by modulating 
bacterial virulence or genome size during infection. PAIs may also represent foreign DNA 
segments that were acquired during bacterial evolution that conferred important 
pathogenic properties to the bacteria. Observed flanking repeats, as observed in Y. pestis 
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for example, may suggest a common mechanism by which these virulence genes were 
integrated into the bacterial chromosomes. 

[0011] Integration of the virulence genes into bacterial chromosomes was further 
elucidated by the discovery and characterization of a locus of enterocyte effacement (the 
LEE locus) in enteropathogenic E. coli (McDaniel, et al, Proc. Natl Acad Sci. (USA) 
92:1664-8 1995). The LEE locus comprises 35-kb and encodes many genes required for 
these bacteria to "invade" and degrade the apical structure of enerocytes causing diarrhea. 
Although the LEE and PAI-I loci encode different virulence genes, these elements are 
located at the exact same site in the E. coli genome and contain the same DNA sequence 
within their right-hand ends, thus suggesting a common mechanism for their insertion. 
1 0012] Besides being found in enteropathogenic E. coli, the LEE element is also 
present in rabbit diarrheal E. coli^ Hafhia alvei, and Citrobacter freundii biotype 4280, all 
of which induce attaching and effacing lesions on the apical face of enterocytes. The LEE 
locus appears to be inserted in the bacterial chromosome as a discrete molecular and 
fimctional virulence unit in the same fashion as PAI-I, PAI-II, and Yersinia PAL 
[0013] Along these same lines, a 40-kb Salmonella typhimurium PAI was 
characterized on the bacterial chromosome which encodes genes required for Salmonella 
entry into nonphagocytic epithelial cells of the intestine (Mills, D.M,, et al. Mot 
Microbiol 15:749-59 1995). Like the LEE element, this PAI confers to Salmonella the 
ability to invade intestinal cells, and hence may likewise be characterized as an "invasion" 
PAI. 

[0014] The pathogenicity islands described above all possess the common feature of 
conferring complex virulence properties to the recipient bacteria. However, they may be 
separated into two types by their respective contributions to virulence. PAI-I, PAI-II, and 
the Z pestis PAI confer multiple virulence phenotypes, while the LEE and the S. 
typhimurium "invasion" PAI encode many genes specifying a single, complex virulence 
process. 

[0015] It is advantageous to characterize closely-related bacteria that contain or do not 
contain the PAI by the isolation of a discrete molecular and functional unit on the bacterial 
chromosome. Since the presence versus the absence of essential virulence genes can often 
distinguish closely-related virulent versus avirulent bacterial strains or species, 
experiments have been conducted to identify virulence loci and potential PAIs by isolating 
DNA sequences that are unique to virulent bacteria (Bloch, C.A., et al^ J Bacteriol 
176:7121-5 1994, Groisman, E.A., J. 12:3779-87 1993). 
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10016] At least two PAIs are present in E, coli J96. These PAIs, PAI IV and PAI V 
are linked to tRNA loci but at sites different from those occupied by other known E. coli 
PAIs. Swenson a/.. Infect, andlmmun. 64'3136-31A3 (1996). 

|0017] The era of true comparative genomics has been ushered in by high through-put 
genomic sequencing and analysis. The first two complete bacterial genome sequences, 
those of Haemophilus influenzae and Mycoplasma genitalium were recently described 
(Fleischmann, R.D., et aL, Science 269:496 (1995); Fraser, CM., et aL, Science 270:397 
(1995)). Large scale DNA sequencing efforts also have produced an extensive collection 
of sequence data from eukaryotes, including Homo sapiens (Adams, M.D., et aL, Nature 
377:3 (1995)) and Saccharomyces cerevisiae (Levy, J., Yeast i0:1689 (1994)). 
[0018] The need continues to exist for the appKcation of high through-put sequencing 
and analysis to study genomes and subgenomes of infectious organisms. Further, a need 
exists for genetic markers that can be employed to distinguish closely-related virulent and 
avirulent strains of a given bacteria. 

Summary of the Invention 

[0019] The present invention is based on the high through-put, random, sequencing of 
cosmid clones covering two pathogenic islands (PAIs) of uropathogenic Escherichia coli 
strain J96 (04:K6; E, coli J96). PAIs are large fragments of DNA which comprise 
pathogenicity determinants. PAI IV is located approximately at 64 min (ne^ixpheV) on the 
E, coli chromosome and is greater than 170 kilobases in size. PAI V is located at 
approximately 94 min (atpheR) on the E. coli chromosome and is approximately 106 kb in 
size. These PAIs differ in location to the PAIs described by Hacker and colleagues for 
uropathogenic strain 536 (PAI I, 82 minutes {selC} and PAI II, 97 minutes {leuX}). 
[0020] The location of the PAIs relative to one another and the cosmid clones covering 
the J96 PAIs is shown in Figure 1. The present invention relates to the nucleotide 
sequences of 142 fragments of DNA (contigs) covering the PAI IV and PAI V regions of 
the E. coli J96 chromosome. The nucleotide sequences shown in SEQ ID NOs: 1 through 
142 were obtained by shotgun sequencing eleven E. coli J96 subclones, which were 
deposited in two pools on September 23, 1996 at the American Type Culture Collection, 
12301 Park Lawn Drive, Rockville, Maryland 20852, and given accession numbers 97726 
(includes 7 cosmid clones covering PAI (IV) and 97727 (includes 4 cosmid clones 
covering PAI V). The deposited sets or "pools" of clones are more fully described in 
Example 1. In addition, E, coli strain J96 was also deposited at the American Type 
Culture Collection on September 23, 1996, and given accession number 98176. 
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[0021] Three hundred fifty-one open reading frames have been thus far identified in 
the 142 contigs described by SEQ ID NOs: 1 through 142. Thus, the present invention is 
directed to isolated nucleic acid molecules comprising open reading frames (ORFs) 
encoding E. coli proteins that are located in two pathogenic island regions of the 
chromosome of uropathogenic E. coli J96. 

[0022] The present invention also relates to variants of the nucleic acid molecules of 
the present invention, which encode portions, analogs or derivatives of E. coli J96 PAI 
proteins. Further embodiments include isolated nucleic acid molecules comprising a 
polynucleotide having a nucleotide sequence at least 90% identical, and more preferably at 
least 95%, 96%, 97%, 98% or 99% identical, to the nucleotide sequence of an E. coli J96 
PAI ORF described herein. 

1 0023] The present invention also relates to recombinant vectors, which include the 
isolated nucleic acid molecules of the present invention, host cells containing the 
recombinant vectors, as well as methods for making such vectors and host cells for E, coli 
J96 PAI protein production by recombinant techniques. 

[0024] The invention further provides isolated polypeptides encoded by the E. coli J96 
PAI ORFs. It will be recognized that some amino acid sequences of the polypeptides 
described herein can be varied without significant effect on the structure or function of the 
protein. If such differences in sequence are contemplated, it should be remembered that 
there will be critical areas on the protein which determine activity. In general, it is 
possible to replace residues which form the tertiary structure, provided that residues 
performing a similar function are used. In other instances, the type of residue may be 
completely unimportant if the alteration occurs at a non-critical region of the protein. 
[0025] In another aspect, the invention provides a peptide or polypeptide comprising 
an epitope-bearing portion of a polypeptide of the invention. The epitope-bearing portion 
is an immunogenic or antigenic epitope useful for raising antibodies. 
[0026] The invention further provides a vaccine comprising one or more E. coli J96 
PAI antigens together with a pharmaceutically acceptable diluent, carrier, or excipient, 
wherein the one or more antigens are present in an amount effective to elicit protective 
antibodies in an animal to pathogenic E. coli, such as strain J96. 

[0027] The invention also provides a method of eliciting a protective immune response 
in an animal comprising administering to the animal the above-described vaccine. 
[0028] The invention further provides a method for identifying pathogenic E. coli in 
an animal comprising analyzing tissue or body fluid from the animal for one or more of: 

(a) poly nucleic acids encoding an open reading frame listed in Tables 1-4; 

(b) polypeptides encoded for by an open reading frame listed in Tables 1-4; or 
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(c) antibodies specific to polypeptides encoded for by an open reading frame 
listed in Tables 1-4. 

[0029] The invention further provides a nucleic acid probe for the detection of the 
presence of one or more E, coli PAI nucleic acids (nucleic acids encoding one or more 
ORFs as listed in Tables 1-4) in a sample from an individual comprising one or more 
nucleic acid molecules sufficient to specifically detect under stringent hybridization 
conditions the presence of the above-described molecule in the sample. 
[0030] The invention also provides a method of detecting E. coli PAI nucleic acids in 
a sample comprising: 

a) contacting the sample with the above-described nucleic acid probe, under 
conditions such that hybridization occurs, and 

b) detecting the presence of the probe bound to an E. coli PAI nucleic acid. 
[0031] The invention further provides a kit for detecting the presence of one or more 
E, coli PAI nucleic acids in a sample comprising at least one container means having 
disposed therein the above-described nucleic acid probe. 

[0032] The invention also provides a diagnostic kit for detecting the presence of 
pathogenic E. coli in a sample comprising at least one container means having disposed 
therein one or more of the above-described antibodies. 

[0033] The invention also provides a diagnostic kit for detecting the presence of 
antibodies to pathogenic E. coli in a sample comprising at least one container means 
having disposed therein one or more of the above-described antigens. 

Brief Description of the Figures 

[0034] Figure 1 is a schematic diagram of cosmid clones derived from E. coli J96 
pathogenicity island and map positions of known E, coli PAIs (not drawn to scale). The 
gray bar represents the E, coli K-12 chromosome with minute demarcations of PAI 
junction points located above the bar. E. coli J96 overlapping cosmid clones are 
represented by hatched bars (overlap not drawn to scale) with positions of hly, pap^ and 
prs operons indicated above bar. The PAIs and estimated sizes are shown above and 
below the K-12 chromosome map. 

[0035] Figure 2 is a block diagram of a computer system 102 that can be used to 
implement the computer-based systems of present invention. 

Detailed Description of the Invention 

[0036] The present invention is based on high through-put, random sequencing of a 
uropathogenic strain of Escherichia coll The DNA sequences of contiguous DNA 
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fragments covering the pathogenicity islands, PAI IV (also referred to as PAIj96(phev)) and 
PAI V (also referred to as PAIjg^^pheu)) ^om the chromosome of the E, coli uropathogenic 
strain, J96 (04:K6) were determined. The sequences were used for DNA and protein 
sequence similarity searches of the database. 

10037] The primary nucleotide sequences generated by shotgun sequencing cosmid 
clones of the PAI IV and PAI V regions of the E. coli chromosome are provided in SEQ 
ID NOs:l through 142. These sequences represent contiguous fragments of the PAI DNA. 
As used herein, the"primary sequence" refers to the nucleotide sequence represented by 
the lUPAC nomenclature system. The present invention provides the nucleotide 
sequences of SEQ ID NOs:l through 142, or representative fragments thereof, in a form 
that can be readily used, analyzed, and interpreted by a skilled artisan. Within these 142 
sequences, there have been thus far identified 351 open reading frames (ORFs) that are 
described in greater detail below. 

[0038] As used herein, a "representative fragment" refers to E. coli J96 PAI protein- 
encoding regions (also referred to herein as open reading frames or ORFs), expression 
modulating fragments, and fragments that can be used to diagnose the presence ofE, coli 
in a sample. A non-limiting identification of such representative fragments is provided in 
Tables 1 through 6. As described in detail below, representative fragments of the present 
invention fiirther include nucleic acid molecules having a nucleotide sequence at least 95% 
identical, preferably at least 96%, 97%, 98%, or 99% identical, to an ORE identified in 
Tables 1 through 6. 

[0039] As indicated above, the nucleotide sequence information provided in SEQ ID 
NOs:l through 142 was obtained by sequencing cosmid clones covering the PAIs located 
on the chromosome of E. coli J96 using a megabase shotgun sequencing method. The 
sequences provided in SEQ ID NOs:l through 142 are highly accurate, although not 
necessarily a 100% perfect, representation of the nucleotide sequences of contiguous 
stretches of DNA (contigs) which include the ORFs located on the two pathogenicity 
islands of E, coli J96. As discussed in detail below, using the information provided in 
SEQ ID NOs:l through 142 and in Tables 1 through 6 together with routine cloning and 
sequencing methods, one of ordinary skill in the art would be able to clone and sequence 
all "representative fragments" of interest including open reading frames (ORFs) encoding 
a large variety of E. coli J96 PAI proteins. In rare instances, this may reveal a nucleotide 
sequence error present in the nucleotide sequences disclosed in SEQ ID NOs: 1 through 
142. Thus, once the present invention is made available (i.e., once the information in SEQ 
ID NOs: 1 through 142 and in Tables 1 through 6 have been made available), resolving a 
rare sequencing error would be well within the skill of the art. Nucleotide sequence 
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editing software is publicly available. For example. Applied Biosystem's (AB) 
Auto Assembler can be used as an aid during visual inspection of nucleotide sequences. 
[0040] Even if all of the rare sequencing errors were corrected, it is predicted that the 
resulting nucleotide sequences would still be at least about 99.9% identical to the reference 
nucleotide sequences in SEQ ID NOs: 1 through 142. Thus, the present invention further 
provides nucleotide sequences that are at least 99.9% identical to the nucleotide sequence 
of SEQ ID NOs: 1 through 142 in a form which can be readily used, analyzed and 
interpreted by the skilled artisan. Methods for determining whether a nucleotide sequence 
is at least 99.9% identical to a reference nucleotide sequence of the present invention are 
described below. 


Nucleic Acid Molecules 

[0041] The present invention is directed to isolated nucleic acid fragments of the PAIs 
of E. coli J96. Such fragments include, but are not limited to, nucleic acid molecules 
encoding polypeptides (hereinafter open reading frames (ORFs)), nucleic acid molecules 
that modulate the expression of an operably linked ORF (hereinafter expression 
modulating fragments (EMFs)), and nucleic acid molecules that can be used to diagnose 
the presence of E. coli in a sample (hereinafter diagnostic fragments (DFs)). 
[0042] By isolated nucleic acid molecule(s) is intended a nucleic acid molecule, DNA 
or RNA, that has been removed from its native environment. For example, recombinant 
DNA molecules contained in a vector are considered isolated for the purposes of the 
present invention. Further examples of isolated DNA molecules include recombinant 
DNA molecules maintained in heterologous host cells, purified (partially or substantially) 
DNA molecules in solution, and nucleic acid molecules produced synthetically. Isolated 
RNA molecules include in vitro RNA transcripts of the DNA molecules of the present 
invention. 

[0043] In one embodiment, E. coli J96 PAI DNA can be mechanically sheared to 
produce fragments about 15-20 kb in length, which can be used to generate an E. coli J96 
PAI DNA library by insertion into lambda clones as described in Example 1 below. 
Primers flanking an ORF described in Tables 1 through 6 can then be generated using the 
nucleotide sequence information provided in SEQ ID NOs: 1 through 142. The 
polymerase chain reaction (PGR) is then used to amplify and isolate the ORF from the 
lambda DNA library. PGR clonipg is well known in the art. Thus, given SEQ ID NOs: 1 
through 142, and Tables 1 through 6, it would be routine to isolate any ORF or other 
representative fragment of the E. coli J96 PAI subgenomes. Isolated nucleic acid 
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molecules of the present invention include, but are not limited to, single stranded and 

double stranded DNA, and single stranded RNA, and complements thereof. 

[0044] Tables 1 through 6 herein describe ORFs in the E. coli J96 PAI cosmid clone 

library. 

[00451 Tables 1 and 3 list, for PAI IV and PAI V, respectively, a number of ORFs that 
putatively encode a recited protein based on homology matching with protein sequences 
from an organism listed in the Table. Tables 1 and 3 indicate the location of ORFs (i.e., 
the position) by reference to its position within the one of the 142 E. coli J96 contigs 
described in SEQ ID NOs: 1 through 142. Column 1 of Tables 1 and 3 provides the 
Sequence ID Number (SEQ ID NO) of the contig in which a particular open reading frame 
is located. Column 2 numerically identifies a particular ORF on a particular contig (SEQ 
ID NO) since many contigs comprise a plurality of ORFs. Columns 3 and 4 indicate an 
ORF s position in the nucleotide sequence (contig) provided in SEQ ID NOs: 1 through 
142 by referring to start and stop positions m the contig sequence. One of ordinary skill in 
the art will appreciate that the ORFs may be oriented in opposite directions in the E coli 
chromosome. This is reflected in columns 3 and 4. Column 5 provides a database 
accession number to a homologous protein identified by a similarity search of public 
sequence databases {see, infra). Column 6 describes the matching protein sequence and 
the source organism is identified in brackets. Column 7 of Tables 1 and 3 indicates the 
percent identity of the protein sequence encoded by an ORF to the corresponding protein 
sequence from the organism appearing in parentheses in the sixth column. Column 8 of 
Tables I and 3 indicates the percent similarity of the protein sequence encoded by an ORF 
to the corresponding protein sequence from the organism appearing in parentheses in the 
sixth column. The concepts of percent identity and percent similarity of two polypeptide 
sequences are well understood in the art and are described m more detail below. Identified 
genes can frequently be assigned a putative cellular role category adapted from Riley {see, 
Riley, M., Microbiol. Rev. 57:862 (1993)). Column 9 of Tables 1 and 3 provides the 
nucleotide length of the open reading frame. 

[0046] Tables 2 and 4, below, provide ORFs of E. coli J96 PAI IV and PAI V, 
respectively, that did not elicit a homology match with a known sequence from either E. 
coli or another organism. As above, the first column in Tables 2 and 4 provides the contig 
in which the ORF is located and the second column numerically identifies a particular 
ORF in a particular contig. Columns 3 and 4 identify an ORF s position in one of SEQ ID 
NOs: I through 142 by reference to start and stop nucleotides. 

[0047] Tables 5 and 6, below, provide the E. coli J96 PAI IV ORFs and PAI V ORFs, 
respectively, identified by the present inventors that provided a significant match to a 
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previously published E. coli protein. The columns correspond to the columns appearing in 
Tables 1 and 3. 

[0048] Further details concerning the algorithms and criteria used for homology 
searches are provided in the Examples belov^. A skilled artisan can readily identify ORFs 
in the Escherichia coli J96 cosmid library other than those listed in Tables 1 through 6, 
such as ORFs that are overlapping or encoded by the opposite strand of an identified ORF 
in addition to those ascertainable using the computer-based systems of the present 
invention. 

[0049] Isolated nucleic acid molecules of the present invention include DNA 
molecules having a nucleotide sequence substantially different than the nucleotide 
sequence of an ORF described in Tables 1 through 4, but which, due to the degeneracy of 
the genetic code, still encode a E. coli J96 PAI protein. The genetic code is well known in 
the art. Thus, it would be routine to generate such degenerate variants. 
[0050] The present invention further relates to variants of the nucleic acid molecules 
of the present invention, which encode portions, analogs or derivatives of an E, coli 
protein encoded by an ORF described in Table 1 through 4. Non-naturally occurring 
variants may be produced using art-known mutagenesis techniques and include those 
produced by nucleotide substitutions, deletions or additions. The substitutions, deletions 
or additions may involve one or more nucleotides. The variants may be altered in coding 
regions, non-coding regions, or both. Alterations in the coding regions may produce 
conservative or non-conservative amino acid substitutions, deletions or additions. 
Especially preferred among these are silent substitutions, additions and deletions, which 
do not aher the properties and activities of the E. coli protein or portions thereof Also 
especially preferred in this regard are conservative substitutions. 

[0051] Further embodiments of the invention include isolated nucleic acid molecules 
comprising a polynucleotide having a nucleotide sequence at least 90% identical, and 
more preferably at least 95%, 96%, 97%, 98% or 99% identical, to the nucleotide 
sequence of an ORF described in Tables 1 through 6, preferably 1 through 4. By a 
polynucleotide having a nucleotide sequence at least, for example, 95% identical to the 
reference E. coli ORF nucleotide sequence is intended that the nucleotide sequence of the 
polynucleotide is identical to the reference sequence except that the polynucleotide 
sequence may include up to five point mutations per each 100 nucleotides of the ORF 
sequence. In other words, to obtain a polynucleotide having a nucleotide sequence at least 
95% identical to a reference ORF nucleotide sequence, up to 5% of the nucleotides in the 
reference sequence may be deleted or substituted with another nucleotide, or a number of 
nucleotides up to 5% of the total nucleotides in the reference sequence may be inserted 
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into the reference sequence. These mutations of the reference sequence may occur at the 
5' or 3' terminal positions of the reference nucleotide sequence or anywhere between those 
terminal positions, interspersed either individually among nucleotides in the reference 
sequence or in one or more contiguous groups within the reference sequence. 
[0052] As a practical matter, whether any particular nucleic acid molecule is at least 
90%, 95%, 96%, 97%, 98% or 99% identical to the nucleotide sequence of an £. coli J96 
PAI ORF can be determined conventionally using known computer programs such as the 
Bestfit program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics 
Computer Group, University Research Park, 575 Science Drive, Madison, WI 53711). 
Bestfit uses the local homology algorithm of Smith and Waterman, Advances in Applied 
Mathematics 2: 482-489 (1981), to find the best segment of homology between two 
sequences. When using Bestfit or any other sequence alignment program to determine 
whether a particular sequence is, for instance, 95% identical to a reference sequence 
according to the present invention, the parameters are set, of course, such that the 
percentage of identity is calculated over the full length of the reference nucleotide 
sequence and that gaps in homology of up to 5% of the total number of nucleotides in the 
reference sequence are allowed. 

10053] Preferred are nucleic acid molecules having sequences at least 90%, 95%, 96%, 
97%, 98% or 99% identical to the nucleic acid sequence of an E. coli J96 PAI ORF that 
encode a functional polypeptide. By a "functional polypeptide" is intended a polypeptide 
exhibiting activity similar, but not necessarily identical, to an activity of the protein 
encoded by the E. coli J96 PAI ORF. For example, the E. coli ORF [Contig ID 84, ORF 
ID 3 (84/3)] encodes a hemolysin. Thus, a functional polypeptide encoded by a nucleic 
acid molecule having a nucleotide sequence, for example, 95% identical to the nucleotide 
sequence of 84/3, will also possess hemolytic activity. As the skilled artisan will 
appreciate, assays for determining whether a particular polypeptide is functional will 
depend on which ORF is used as the reference sequence. Depending on the reference 
ORF, the assay chosen for measuring polypeptide activity will be readily apparent in light 
of the role categories provided in Tables 1,3,5 and 6. 

[0054] Of course, due to the degeneracy of the genetic code, one of ordinary skill in 
the art will immediately recognize that a large number of the nucleic acid molecules 
having a sequence at least 90%, 95%, 96%, 97%, 98%, or 99% identical to the nucleic acid 
sequence of a reference ORF will encode a functional polypeptide. In fact, since 
degenerate variants all encode the same amino acid sequence, this will be clear to the 
skilled artisan even without performing a comparison assay for protein activity. It will be 
further recognized in the art that, for such nucleic acid molecules that are not degenerate 
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variants, a reasonable number will also encode a functional polypeptide. This is because 
the skilled artisan is fully aware of amino acid substitutions that are either less likely or 
not likely to significantly affect protein fimction (e.g., replacing one aliphatic amino acid 
with a second aliphatic amino acid). 

[0055] For example, guidance concerning how to make phenotypically silent amino 
acid substitutions is provided in Bowie, J. U. et al, "Deciphering the Message in Protein 
Sequences: Tolerance to Amino Acid Substitutions," Science 2^7.-1306-1310 (1990), 
wherein the authors indicate that there are two main approaches for studying the tolerance 
of an amino acid sequence to change. The first method relies on the process of evolution, 
in which mutations are either accepted or rejected by natural selection. The second 
approach uses genetic engineering to introduce amino acid changes at specific positions of 
a cloned gene and selections or screens to identify sequences that maintain functionality. 
As the authors state, these smdies have revealed that proteins are surprisingly tolerant of 
amino acid substitutions. The authors further mdicate which amino acid changes are likely 
to be permissive at a certain position of the protein. For example, most buried amino acid 
residues require nonpolar side chains, whereas few features of surface side chains are 
generally conserved. Other such phenotypically silent substitutions are described in 
Bowie, J.U. et al, supra, and the references cited therein. 

[00561 The present invention is further directed to fragments of the isolated nucleic 
acid molecules described herein. By a fragment of an isolated nucleic acid molecule 
having the nucleotide sequence of an E. coli J96 PAI ORF is intended fragments at least 
about 1 5 nt, and more preferably at least about 20 nt, still more preferably at least about 30 
nt, and even more preferably, at least about 40 nt in length that are useful as diagnostic 
probes and primers as discussed herein. Of course, larger fragments 50-500 nt in length 
are also useful according to the present invention as are fragments corresponding to most, 
if not all, of the nucleotide sequence of an £. coli J96 PAI ORF. By a fragment at least 20 
nt in length, for example, is intended fragments that include 20 or more contiguous bases 
from the nucleotide sequence of an E. coli J96 PAI ORF. Since E. coli ORFs are listed in 
Tables 1 through 6 and the sequences of the ORFs have been provided within the contig 
sequences of SEQ ID NOs: 1 through 142, generating such DNA fragments would be 
routine to the skilled artisan. For example, restriction endonuclease cleavage or shearing 
by sonication could easily be used to generate fragments of various sizes from the PAI 
DNA that is incorporated into the deposited pools of cosmid clones. Alternatively, such 
fragments could be generated synthetically. 
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[0057] Preferred nucleic acid fragments of the present invention include nucleic acid 
molecules encoding epitope-bearing portions of an E. coli J96 PAX protein. Methods for 
determining such epitope-bearing portions are described in detail below. 
[0058] In another aspect, the invention provides an isolated nucleic acid molecule 
comprising a polynucleotide that hybridizes under stringent hybridization conditions to a 
portion of the polynucleotide in a nucleic acid molecule of the invention described above, 
for instance, an ORF described in Tables 1 through 6, preferably an ORF described in 
Tables 1, 2, 3 or 4. By "stringent hybridization conditions" is intended overnight 
incubation at 42°C in a solution comprising: 50% formamide, 5 x SSC (750 mM NaCl, 
75mM trisodium citrate), 50 mM sodium phosphate (pH 7.6), 5 x Denhardt's solution, 
10% dextran sulfate, and 20 g/ml denatured, sheared salmon sperm DNA, followed by 
washing the filters in 0.1 x SSC at about 65°C. 

[0059] By a polynucleotide that hybridizes to a "portion" of a polynucleotide is 
intended a polynucleotide (either DNA or RNA) hybridizing to at least about 15 
nucleotides (nt), and more preferably at least about 20 nt, still more preferably at least 
about 30 nt, and even more preferably about 30-70 nt of the reference polynucleotide. 
These are useful as diagnostic probes and primers as discussed above and in more detail 
below. 

[0060] Of course, polynucleotides hybridizing to a larger portion of the reference 
polynucleotide (e.g., a E. coli ORF), for instance, a portion 50-500 nt in length, or even to 
the entire length of the reference polynucleotide, are also useful as probes according to the 
present invention, as are polynucleotides corresponding to most, if not all, of an E. coli J96 
PAT ORF. 

[0061] By "expression modulating fragment" (EMF), is intended a series of 
nucleotides that modulate the expression of an operably linked ORF or EMF. A sequence 
is said to "modulate the expression of an operably linked sequence" when the expression 
of the sequence is altered by the presence of the EMF. EMFs include, but are not limited 
to, promoters, and promoter modulating sequences (inducible elements). One class of 
EMFs are fragments that induce the expression of an operably linked ORF in response to a 
specific regulatory factor or physiological event. EMF sequences can be identified within 
the E. coli genome by their proximity to the ORFs described in Tables 1 through 6. An 
intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 
nucleotides in length, taken 5' from any one of the ORFs of Tables 1 through 6 will 
modulate the expression of an operably linked 3' ORF in a fashion similar to that found 
with the naturally linked ORF sequence. As used herein, an "intergenic segment" refers to 
the fragments of the E. coli J96 PAX subgenome that are between two ORF(s) herein 
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described. Alternatively, EMFs can be identified using known EMFs as a target sequence 
or target motif in the computer-based systems of the present invention. 
[0062] The presence and activity of an EMF can be confirmed using an EMF trap 
vector. An EMF trap vector contains a cloning site 5' to a marker sequence. A marker 
sequence encodes an identifiable phenotype, such as antibiotic resistance or a 
complementing nutrition auxotrophic factor, which can be identified or assayed when the 
EMF trap vector is placed within an appropriate host vmder appropriate conditions. As 
described above, an EMF will modulate the expression of an operably linked marker 
sequence. A more detailed discussion of various marker sequences is provided below. 
[0063] A sequence that is suspected as being an EMF is cloned in all three reading 
frames in one or more restriction sites upstream fi-om the marker sequence in the EMF trap 
vector. The vector is then transformed into an appropriate host using known procedures 
and the phenotype of the transformed host in examined under appropriate conditions. As 
described above, an EMF will modulate the expression of an operably linked marker 
sequence. 

[0064] By a "diagnostic firagment" (DF), is intended a series of nucleotides that 
selectively hybridize to E. coli sequences. DFs can be readily identified by identifying 
unique sequences within the E. coli J96 PAl subgenome, or by generating and testing 
probes or amplification primers consisting of the DF sequence in an appropriate diagnostic 
format for amplification or hybridization selectivity. 

[0065] Each of the ORFs of the E. coli J96 PAI subgenome disclosed in fables 1 
through 4, and the EMF found 5' to the ORF, can be used in numerous ways as 
polynucleotide reagents. The sequences can be used as diagnostic probes or diagnostic 
amplification primers to detect the presence of uropathogenic E. coli in a sample. This is 
especially tiie case with tiie fragments or ORFs of Table 2 and 4 which will be highly 
selective for uropathogenic E. coli J96, and perhaps other uropathogenic or extraintestinal 
strains that include one or more PAIs. 

[0066] In addition, the fragments of the present invention, as broadly described, can be 
used to control gene expression through triple helix formation or antisense DNA or RNA, 
both of which methods are based on the binding of a polynucleotide sequence to DNA or 
RNA. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in 
length and are designed to be complementary to a region of the gene involved in 
transcription (triple helix - see Lee etal, Nucl Acids Res. 6:3073 (1979); Cooney et al.. 
Science 241:456 (1988); and Dervan etal.. Science 257:1360 (1991)) or, to the mRNA 
itself (antisense - Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides as 
Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). 
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[0067] Triple helix- formation optimally results in a shut-off of RNA transcription 
from DNA, while antisense RNA hybridization blocks translation of an mRNA molecule 
into polypeptide. Both techniques have been demonstrated to be effective in model 
systems. Information contained in the sequences of the present invention is necessary for 
the design of an antisense or triple helix oligonucleotide. 

Vectors and Host Cells 

[0068] The present invention further provides recombinant constructs comprising one 
or more fragments of the E. coli J96 PAIs. The recombinant constructs of the present 
invention comprise a vector, such as a plasmid or viral vector, into which, for example, an 
E. coli J96 PAI ORF is inserted. The vector may further comprise regulatory sequences, 
including for example, a promoter, operably linked to the ORF. For vectors comprising 
the EMFs of the present invention, the vector may further comprise a marker sequence or 
heterologous ORF operably linked to the EMF. Large numbers of suitable vectors and 
promoters are known to those of skill in the art and are commercially available for 
generating the recombinant constructs of the present invention. The following vectors are 
provided by way of example. Bacterial: pBs, phagescript, PsiX174, pBluescript SK, pBs 
KS, pNH8a, pNH16a, pNH18a, pNH46a (Stratagene); pTrc99A, pKK223-3, pKK233-3, 
pDR540, pRIT5 (Pharmacia). Eukaryotic: pWLneo, pSV2cat, pOG44, pXTl, pSG 
(Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia). 

[0069] Promoter regions can be selected from any desired gene using CAT 
(chloramphenicol transferase) vectors or other vectors with selectable markers. Two 
appropriate vectors are pKK232-8 and pCM7. Particular named bacterial promoters 
include lad, lacZ, T3, T7, gpt, lambda P , and trc. Eukaryotic promoters include CMV 
immediate early, HSV thymidine kinase, early and late SV40, LTRs from retrovirus, and 
mouse metallothionein-I. Selection of th?-appropriate vector and promoter is well within 
the level of ordinary skill in the art. 

[0070] The present invention further provides host cells containing any one of the 
isolated fragments (preferably an ORF) of the E. coli J96 PAIs described herein. The host 
cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic 
host cell, such as a yeast cell, or the host cell can be a procaryotic cell, such as a bacterial 
cell. Introduction of the recombinant construct into the host cell can be effected by 
calcium phosphate transfection, DEAE, dextran mediated transfection, or electroporation 
(Davis, L. et ai, Basic Methods in Molecular Biology (1986)). Host cells containing, for 
example, an E. coli J96 PAI ORF can be used conventionally to produce the encoded 
protein. 
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Polypeptides and Fragments 

[0071] The invention further provides isolated polypeptides having the amino acid 
sequence encoded by an coli PAI ORP described in Tables 1 through 6, preferably 
Tables 1 through 4, or a peptide or polypeptide comprising a portion of the above 
polypeptides. The terms "peptide" and "oligopeptide" are considered synonymous (as is 
commonly recognized) and each term can be used interchangeably as the context requires 
to indicate a chain of at least to amino acids coupled by peptidyl linkages. The word 
"polypeptide" is used herein for chains containing more than ten amino acid residues. All 
oligopeptide and polypeptide formulas or sequences herein are written from left to right 
and in the direction from amino terminus to carboxy terminus. 

[0072] It v^U be recognized in the art that some amino acid sequences of E, coli 
polypeptides can be varied without significant effect of the structure or function of the 
protein. If such differences in sequence are contemplated, it should be remembered that 
there will be critical areas on the protein which determine activity. In general, it is 
possible to replace residues that form the tertiary structure, provided that residues 
performing a similar function are used. In other instances, the type of residue may be 
completely unimportant if the alteration occurs at a non-critical region of the protein, 
[0073] Thus, the invention further includes variations of polypeptides encoded for by 
ORFs listed in Tables 1 through 6 which show substantial pathogenic activity or which 
include regions of particular E. coli PAI proteins such as the protein portions discussed 
below. Such mutants include deletions, insertions, inversions, repeats, and type 
substitutions (for example, substituting one hydrophilic residue for another, but not 
strongly hydrophilic for strongly hydrophobic as a rule). Small changes or such "neutral" 
amino acid substitutions will generally have little effect on activity. 

[0074] Typically seen as conservative substitutions are the replacements, one for 
another, among the aliphatic amino acids Ala, Val, Leu and He; interchange of the 
hydroxyl residues Ser and Thr, exchange of the acidic residues Asp and Glu, substitution 
between the amide residues Asn and Gin, exchange of the basic residues Lys and Arg and 
replacements among the aromatic residues Phe, Tyr. 

[0075] As indicated in detail above, further guidance concerning which amino acid 
changes are likely to be phenotypically silent (i.e., are not likely to have a significant 
deleterious effect on a function) . can be found in Bowie, J.U., et al., "Deciphering the 
Message in Protein Sequences: Tolerance to Amino Acid Substitutions," Science 
247;1306-1310(1990). 
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[0076] Thus, the fragment, derivative or analog of a polypeptide encoded by an ORF 
described in one of Tables 1 through 6, may be (i) one in which one or more of the amino 
acid residues are substituted with a conserved or non-conserved amino acid residue 
(preferably a conserved amino acid residue) and such substituted amino acid residue may 
or may not be one encoded by the genetic code, or (ii) one in which one or more of the 
amino acid residues includes a substituent group, or (iii) one in which the mature 
polypeptide is fused with another compound, such as a compound to increase the half-life 
of the polypeptide (for example, polyethylene glycol), or (iv) one in which the additional 
amino acids are fused to the matvire polypeptide, such as an IgG Fc fusion region peptide 
or leader or secretory sequence or a sequence which is employed for purification of the 
mature polypeptide or a proprotein sequence. Such fragments, derivatives and analogs are 
deemed to be within the scope of those skilled in the art from the teachings herein. 
[0077] Of particular interest are substitutions of charged amino acids with another 
charged amino acid and with neutral or negatively charged amino acids. The latter results 
in proteins with reduced positive charge to improve the characteristics of said proteins. 
The prevention of aggregation is highly desirable. Aggregation of proteins not only results 
in a loss of activity but can also be problematic when preparing pharmaceutical 
formulations, because they can be immunogenic, (Pinckard et al, Clin Exp. Immunol 
2:331-340 (1967); Robbins et al, Diabetes 56:838-845 (1987); Cleland et al Crit, Rev. 
Therapeutic Drug Carrier Systems 10:307-377 (1993)). 

[0078] The replacement of amino acids can also change the selectivity of binding to 
cell surface receptors. Ostade et al, Nature 367:266-268 (1993) describes certain 
mutations resulting in selective binding of TNF-A to only one of the two known types of 
TNF receptors. Thus, proteins encoded for by the ORFs listed in Tables 1, 2, 3, 4, 5, or 6, 
and that bind to a cell surface receptor, may include one or more amino acid substitutions, 
deletions or additions, either from natural mutations or human manipulation. 
[0079] As indicated, changes are preferably of a minor nature, such as conservative 
amino acid substitutions that do not significantly affect the folding or activity of the 
protein (see Table 7). 
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Aromatic 


TABLE 7, Conservative Amino Acid Substitutions 

Phenylalanine 

Tryptophan 

Tyrosine 


Hydrophobic 


Leucine 

Isoleucine 

Valine 


Polar 


Glutamine 
Asparagine 


Basic 


Arginine 

Lysine 

Histidine 


Acidic 

Small 
Glycine 


Aspartic Acid 
Glutamic Acid 

Alanine 
Serine 
Threonine 
Methionine 


[0080] Amino acids in the proteins encoded by ORFs of the present invention that are 
essential for function can be identified by methods known in the art, such as site-directed 
mutagenesis or alanine-scanning mutagenesis (Curmingham and Wells, Science 24^:1 081- 
1085 (1989)). The latter procedure introduces single alanine mutations at every residue in 
the molecule. The resulting mutant molecules are then tested for biological activity such 
as receptor binding or in vitro,- or in vitro proliferative activity. Sites that are critical for 
ligand-receptor binding can also be determined by structural analysis such as 
crystallization, nuclear magnetic resonance or photoaffinity labeling (Smith et al, J. Mol 
Biol 224:899-904 (1992) and de Vos etal. Science 255:306-312 (1992)). 
[0081] The polypeptides of the present invention are preferably provided in an isolated 
form, and preferably are substantially purified. A recombinantly produced version of the 
polypeptides can be substantially purified by the one-step method described in Smith and 
Johnson, Gene 67:31-40 (1988). 

[0082] The polypeptides of the present invention include the polypeptide encoded by 
the ORFs listed in Tables 1-6, preferably Tables 1-4, as well as polypeptides which have at 
least 90% similarity, more preferably at least 95% similarity, and still more preferably at 
least 96%, 97%, 98% or 99% similarity to those described above, and also include 
portions of such polypeptides with at least 30 amino acids and more preferably at least 50 
amino acids. 
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[0083] By "% similarity" for two polypeptides is intended a similarity score produced 
by comparing the amino acid sequences of the two polypeptides using the Bestfit program 
(Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics Computer Group, 
University Research Park, 575 Science Drive, Madison, WI 53711) and the default 
settings for determining similarity. Bestfit uses the local homology algorithm of Smith 
and Waterman (Advances in Applied Mathematics 2:482-489, 1981) to find the best 
segment of similarity between two sequences. 

[0084] By a polypeptide having an amino acid sequence at least, for example, 95% 
"identical" to a reference amino acid sequence of a polypeptide is intended that the amino 
acid sequence of the polypeptide is identical to the reference sequence except that the 
polypeptide sequence may include up to five amino acid alterations per each 100 amino 
acids of the reference amino acid of said polypeptide. In other words, to obtain a 
polypeptide having an amino acid sequence at least 95% identical to a reference amino 
acid sequence, up to 5% of the amino acid residues in the reference sequence may be 
deleted or substituted with another amino acid, or a number of amino acids up to 5% of the 
total amino acid residues in the reference sequence may be inserted into the reference 
sequence. These alterations of the reference sequence may occur at the amino or carboxy 
tenninai positions of the reference amino acid sequence or anywhere between those 
terminal positions, interspersed either individually among residues in the reference 
sequence or in one or more contiguous groups within the reference sequence. 
[0085] As a practical matter, whether any particular polypeptide is at least 90%, 95%, 
96%, 97%, 98% or 99% identical to, for instance, the amino acid sequence encoded by the 
ORFs listed in Tables 1, 2, 3, 4, 5, or 6 can be determined conventionally using known 
computer programs such the Bestfit program (Wisconsin Sequence Analysis Package, 
Version 8 for Unix, Genetics Computer Group, University Research Park, 575 Science 
Drive, Madison, WI 5371 1. When using Bestfit or any other sequence alignment program 
to determine whether a particular sequence is, for instance, 95% identical to a reference 
sequence according to the present invention, the parameters are set, of course, such that the 
percentage of identity is calculated over the full length of the reference amino acid 
sequence and that gaps in homology of up to 5% of the total number of amino acid 
residues in the reference sequence are allowed. 

[0086] The polypeptide of the present invention could be used as a molecular weight 
marker on SDS-PAGE gels or on molecular sieve gel filtration columns using methods 
well known to those of skill in the art. 

[0087] As described in detail below, the polypeptides of the present invention can also 
be used to raise polyclonal and monoclonal antibodies, which are useful in assays for 
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detecting pathogenic protein expression as described below or as agonists and antagonists 
capable of enhancing or inhibiting protein function of important proteins encoded by the 
ORFs of the present invention. Further, such polypeptides can be used in the yeast 
two-hybrid system to "capture" protein binding proteins which are also candidate agonist 
and antagonist according to the present invention. The yeast two hybrid system is 
described in Fields and Song, Nature 340:245-246 (1989). 

[0088] In another aspect, the invention provides a peptide or polypeptide comprising 
an epitope-bearing portion of a polypeptide of the invention. The epitope of this 
polypeptide portion is an immunogenic or antigenic epitope of a polypeptide of the 
invention. An "immunogenic epitope" is defined as a part of a protein that elicits an 
antibody response when the whole protein is the immunogen. These immunogenic 
epitopes are believed to be confined to a few loci on the molecule. On the other hand, a 
region of a protein molecule to which an antibody can bind is defined as an "antigenic 
epitope." The nimiber of immimogenic epitopes of a protein generally is less than the 
number of antigenic epitopes. See, for instance, Geysen et al, Proc, Natl Acad. Sci. USA 
57:3998- 4002(1983), 

10089] As to the selection of peptides or polypeptides bearing an antigenic epitope 
(i.e.^ that contain a region of a protein molecule to which an antibody can bind), it is well 
known in that art that relatively short synthetic peptides that mimic part of a protein 
sequence are routinely capable of eliciting an antiserum that reacts with the partially 
mimicked protein. See, for instance, Sutcliffe, J. G., Shinnick, T. M., Green, N. and 
Learner, R.A, (1983) Antibodies that react with predetermined sites on proteins. Science 
219:660-666. Peptides capable of eliciting protein-reactive sera are frequently represented 
in the primary sequence of a protein, can be characterized by a set of simple chemical 
rules, and are confined neither to immunodominant regions of intact proteins (i.e., 
immunogenic epitopes) nor to the amino or carboxyl terminals. Peptides that are 
extremely hydrophobic and those of six or fewer residues generally are ineffective at 
inducing antibodies that bind to the mimicked protein; longer, peptides, especially those 
containing proline residues, usually are effective. Sutcliffe et al, supra, at 661. For 
instance, 18 of 20 peptides designed according to these guidelines, containing 8-39 
residues covering 75% of the sequence of the influenza virus hemagglutinin HAl 
polypeptide chain, induced antibodies that reacted with the HAl protein or intact virus; 
and 12/12 peptides from the MuLV polymerase and 18/18 from the rabies glycoprotein 
induced antibodies that precipitated the respective proteins. 

[0090] Antigenic epitope-bearing peptides and polypeptides of the invention are 
therefore useful to raise antibodies, including monoclonal antibodies that bind specifically 
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to a polypeptide of the invention. Thus, a high proportion of hybridomas obtained by 
fusion of spleen cells from donors immunized with an antigen epitope-bearing peptide 
generally secrete antibody reactive with the native protein. Sutcliffe et al, supra, at 663. 
The antibodies raised by antigenic epitope-bearing peptides or polypeptides are useful to 
detect the mimicked protein, and antibodies to different peptides may be used for tracking 
the fate of various regions of a protein precursor which undergoes post-translational 
processing. The peptides and anti-peptide antibodies may be used in a variety of 
qualitative or quantitative assays for the mimicked protein, for instance in competition 
assays since it has been shown that even short peptides (e.g., about 9 amino acids) can 
bind and displace the larger peptides in immunoprecipitation assays. See, for instance, 
Wilson et al, Cell 37:161-11% (1984) at 777. The anti- peptide antibodies of the invention 
also are useful for purification of the mimicked protein, for instance, by adsorption 
chromatography using methods well known in the art. 

[0091] Antigenic epitope-bearing peptides and polypeptides of the invention designed 
according to the above guidelines preferably contain a sequence of at least seven, more 
preferably at least nine and most preferably between about 15 to about 30 amino acids 
contained within the amino acid sequence of a polypeptide of the invention. However, 
peptides or polypeptides comprising a larger portion of an amino acid sequence of a 
polypeptide of the invention, containing about 30 to about 50 amino acids, or any length 
up to and including the entire amino acid sequence of a polypeptide of the invention, also 
are considered epitope-bearing peptides or polypeptides of the invention and also are 
useful for inducing antibodies that react wdth the mimicked protein. Preferably, the amino 
acid sequence of the epitope-bearing peptide is selected to provide substantial solubility in 
aqueous solvents (i.e., the sequence includes relatively hydrophilic residues and highly 
hydrophobic sequences are preferably avoided); and sequences containing proline residues 
are particularly preferred. 

[0092] The epitope-bearing peptides and polypeptides of the invention may be 
produced by any conventional means for making peptides or polypeptides including 
recombinant means using nucleic acid molecules of the invention. For instance, a short 
epitope-bearing amino acid sequence may be fused to a larger polypeptide, which acts as a 
carrier during recombinant production and purification, as well as during immunization to 
produce anti-peptide antibodies. Epitope-bearing peptides also may be synthesized using 
known methods of chemical synthesis. For instance, Houghten has described a simple 
method for synthesis of large numbers of peptides, such as 10-20 mg of 248 different 13 
residue peptides representing single amino acid variants of a segment of the HAl 
polypeptide which were prepared and characterized (by ELISA-type binding studies) in 
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less than four weeks. Houghten, R, A. (1985) General method for the rapid solid-phase 
synthesis of large numbers of peptides: specificity of antigen-antibody interaction at the 
level of individual amino acids. Proc. Natl Acad, Set USA 52:5131-5135. This 
"Simultaneous Multiple Peptide Synthesis (SMPS)" process is further described in U.S. 
Patent No. 4,631,21 1 to Houghten et ah (1986). In this procedure the individual resins for 
the solid-phase synthesis of various peptides are contained in separate solvent-permeable 
packets, enabling the optimal use of the many identical repetitive steps involved in 
solid-phase methods. A completely manual procedure allows 500-1000 or more syntheses 
to be conducted simultaneously. Houghten et al , supra^ at 5 1 34, 

[0093] Epitope-bearing peptides and polypeptides of the invention are used to induce 
antibodies according to methods well known in the art. See, for instance, Sutcliffe et ah, 
supra; Wilson et al., supra; Chow, M. et al, Proc. Natl Acad. Set USA 82:910-914; and 
Bittle, F. J. et al, J. Gen, Virol (56:2347-2354 (1985). Generally, animals may be 
immunized with free peptide; however, anti-peptide antibody titer may be boosted by 
coupling of the peptide to a macromolecular carrier, such as keyhole limpet hemacyanin 
(KLH) or tetanus toxoid. For instance, peptides containing cysteine may be coupled to 
carrier using a linker such as m-maleimidobenzoyl-N-hydroxysuccinimide ester (MBS), 
while other peptides may be coupled to carrier using a more general linking agent such as 
glutaraldehyde. Animals such as rabbits, rats and mice are immxmized with either free or 
carrier-coupled peptides, for instance, by intraperitoneal and/or intradermal injection of 
emulsions containing about 100 p.g peptide or carrier protein and Freund's adjuvant. 
Several booster injections may be needed, for instance, at intervals of about two weeks, to 
provide a useful titer of anti-peptide antibody, which can be detected, for example, by 
ELISA assay using free peptide adsorbed to a solid surface. The titer of anti-peptide 
antibodies in serum from an immunized animal may be increased by selection of 
anti-peptide antibodies, for instance, by adsorption to the peptide on a solid support and 
elution of the selected antibodies according to methods well known in the art. 
[0094] Immunogenic epitope-bearing peptides of the invention, i.e., those parts of a 
protein that elicit an antibody response when the whole protein is the immunogen, are 
identified according to methods known in the art. For instance, Geysen et al, supra, 
discloses a procedure for rapid concurrent synthesis on solid supports of hundreds of 
peptides of sufficient purity to react in an enzyme-linked immunosorbent assay. 
Interaction of synthesized peptides with antibodies is then easily detected without 
removing them from the support. In this manner a peptide bearing an immunogenic 
epitope of a desired protein may be identified routinely by one of ordinary skill in the art. 
For instance, the immunologically important epitope in the coat protein of foot-and-mouth 
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disease virus was located by Geysen et al supra with a resolution of seven amino acids by 
synthesis of an overlapping set of all 208 possible hexapeptides covering the entire 213 
amino acid sequence of the protein. Then, a complete replacement set of peptides in 
which all 20 amino acids were substituted in turn at every position within the epitope were 
synthesized, and the particular amino acids conferring specificity for the reaction with 
antibody were determined. Thus, peptide analogs of the epitope-bearing peptides of the 
invention can be made routinely by this method. U.S. Patent No. 4,708,781 to Geysen 
(1987) further describes this method of identifying a peptide bearing an immunogenic 
epitope of a desired protein. 

[0095] Further still, U.S. Patent No. 5,194,392 to Geysen (1990) describes a general 
method of detecting or determining the sequence of monomers (amino acids or other 
compounds) which is a topological equivalent of the epitope (i.e., a "mimotope") which is 
complementary to a particular paratope (antigen binding site) of an antibody of interest. 
More generally, U.S. Patent No. 4,433,092 to Geysen (1989) describes a method of 
detecting or determining a sequence of monomers which is a topographical equivalent of a 
ligand which is complementary to the ligand binding site of a particular receptor of 
interest. Similarly, U.S. Patent No. 5,480,971 to Houghten, R. A. et ai (1996) on 
Peralkylated Oligopeptide Mixtures discloses linear C -C -alkyl peralkylated 

oligopeptides and sets and libraries of such peptides, as well as methods for using such 
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oligopeptide sets and libraries for determining the sequence of a peralkylated oligopeptide 

that preferentially binds to an acceptor molecule of interest. Thus, non-peptide analogs of 

the epitope-bearing peptides of the invention also can be made routinely by these methods. 

[0096] The entire disclosure of each document cited in this section on "Polypeptides 

and Peptides" is hereby incorporated herein by reference. 

[0097] As one of skill in the art will appreciate, E, coli PAI polypeptides of the present 
invention and the epitope-bearing fragments thereof described above can be combined 
with parts of the constant domain of immunoglobulins (IgG), resulting in chimeric 
polypeptides. These fusion proteins facilitate purification and show an increased half-life 
in vivo. This has been shown, e.g., for chimeric proteins consisting of the first two 
domains of the human CD4-polypeptide and various domains of the constant regions of 
the heavy or light chains of mammalian immunoglobulins (EP A 394,827; Traunecker et 
al, Nature 331:84- 86 (1988)). Fusion proteins that have a disulfide-linked dimeric 
structure due to the IgG part can also be more efficient in binding and neutralizing other 
molecules than the monomeric E. coli J96 PAI proteins or protein fragments alone 
(Fountoulakis et ai, J, Biochem 270:3958-3964 (1995)). 
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Vaccines 

[0098] In another embodiment, the present invention relates to a vaccine, preferably in 
unit dosage form, comprising one or more E, coli J96 PAI antigens together with a 
pharmaceutically acceptable diluent, carrier, or excipient, wherein the one or more 
antigens are present in an amount effective to elicit a protective immune response in an 
animal to pathogenic E. coli. Antigens of E, coli J96 PAI IV and V may be obtained from 
polypeptides encoded for by the ORFs listed in Tables 1-6, particularly Tables 1-4, using 
methods well known in the art. 

[0099] In a preferred embodiment, the antigens are E. coli J96 PAI IV or PAI V 
proteins that are present on the surface of pathogenic E, coli. In another preferred 
embodiment, the pathogenic E. coli J96 PAI IV or PAI V protein-antigen is conjugated to 
an E. coli capsular polysaccharide (CP), particularly to capsular polypeptides that are more 
prevalent in pathogenic strains, to produce a double vaccine. CPs, in general, may be 
prepared or synthesized as described in Schneerson et al J, Exp. Med 752/361 -376 
(1980); Marburg et al J, Am. Chem. Soc. 108:5282 (1986); Jennings et al, J. Immunol 
127:101 1-1018 (1981); and Beuvery ei al, Infect. Immunol 40:39-45 (1983). In a further 
preferred embodiment, the present invention relates to a method of preparing a 
polysaccharide conjugate comprising: obtaining the above-described £. coli J96 PAI 
antigen; obtaining a CP or fragment from pathogenic E. coli; and conjugating the antigen 
to the CP or CP fragment. 

[0100] In a preferred embodiment, the animal to be protected is selected from the 
group consisting of humans, horses, deer, cattle, pigs, sheep, dogs, and chickens. In a more 
preferred embodiment, the animal is a human or a dog, 

[0101] In a further embodiment, the present invention relates to a prophylactic method 
whereby the incidence of pathogenic E, co/z-induced symptoms are decreased in an 
animal, comprising administering to the animal the above-described vaccine, wherein the 
vaccine is administered in an amount effective to elicit protective antibodies in an animal 
to pathogenic E. coli. This vaccination method is contemplated to be usefiil in protecting 
against severe diarrhea (pathogenic intestinal E. coli strains), urinary tract infections 
(uropathogenic E, coli) and infections of the brain (extraintestinal E. coli). The vaccine of 
the invention is used in an effective amount depending on the route of administration. 
Although intra-nasal, subcutaneous or intramuscular routes of administration are preferred, 
the vaccine of the present invention can also be administered by an oral, intraperitoneal or 
intravenous route. One skilled in the art will appreciate that the amounts to be 
administered for any particular treatment protocol can be readily determined without 
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undue experimentation. Suitable amounts are within the range of 2 micrograms of the 
protein per kg body weight to 100 micrograms per kg body weight. 

[0102] The vaccine can be delivered through a vector such as BCG. The vaccine can 
also be delivered as naked DNA coding for target antigens. 

[0103] The vaccine of the present invention may be employed in such dosage forms as 
capsules, liquid solutions, suspensions or elixirs for oral administration, or sterile liquid 
forms such as solutions or suspensions. Any inert carrier is preferably used, such as 
saline, phosphate-buffered saline, or any such carrier in which the vaccine has suitable 
solubility properties. The vaccines may be in the form of single dose preparations or in 
multi-dose flasks which can be used for mass vaccination programs. Reference is made to 
Remington's Pharmaceutical Sciences^ Mack Publishing Co., Easton, PA, Osol (ed.) 
(1980); and New Trends and Developments in Vaccines, Voller et al (eds.), University 
Park Press, Baltimore, MD (1978), for methods of preparing and using vaccines. 
[0104] The vaccines of the present invention may further comprise adjuvants which 
enhance production of antibodies and immune cells. Such adjuvants include, but axe not 
limited to, various oil formulations such as Freund's complete adjuvant (CPA), the 
dipeptide known as MDP, saponins (ex. Quillajasaponin fraction QA-21, U.S. Patent No. 
5*047,540), aluminum hydroxide, or lymphatic cytokines. Freund's adjuvant is an 
emulsion of mineral oil and water which is mixed with the immunogenic substance. 
Although Freund's adjuvant is powerful, it is usually not administered to humans. Instead, 
the adjuvant alum (aluminum hydroxide) may be used for administration to a human. 
Vaccine may be absorbed onto the aluminum hydroxide from which it is slowly released 
after injection. The vaccine may also be encapsulated within liposomes according to 
Fullerton, U.S. Patent No. 4,235,877. 

Protein Function 

[0105] Each ORF described in Tables 1 and 3 possesses a biological role similar to the 
role associated with the identified homologous protein. This allows the skilled artisan to 
determine a function for each identified coding sequence. For example, a partial list of the 
E, coll protein functions provided in Tables 1 and 3 includes many of the functions 
associated with virulence of pathogenic bacterial strains. These include, but are not 
limited to adhesins, excretion pathway proteins, O-antigen/carbohydrate modification, 
cytotoxins and regulators. A more detailed description of several of these functions is 
provided in Example 1 below. 
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Diagnostic Assays 

[0106] In another preferred embodiment, the present invention relates to a method of 
detecting pathogenic E. coli nucleic acid in a sample comprising: 

(a) contacting the sample with the above-described nucleic acid probe^ under 
conditions such that hybridization occurs, and 

(b) detecting the presence of the probe bound to pathogenic E, coli nucleic 

acid. 

[0107] In another preferred embodiment, the present invention relates to a diagnostic 
kit for detecting the presence of pathogenic E, coli nucleic acid in a sample comprising at 
least one container means having disposed therein the above-described nucleic acid probe. 
[0108] In another preferred embodiment, the present invention relates to a diagnostic 
kit for detecting the presence of pathogenic E. coli antigens in a sample comprising at least 
one container means having disposed therein the above-described antibodies. 
[0109] In another preferred embodiment, the present invention relates to a diagnostic 
kit for detecting the presence of antibodies to pathogenic E. coli antigens in a sample 
comprising at least one container means having disposed therein the above-described 
antigens. 

[0110] The present invention provides methods to identify the expression of an ORF 
of the present invention, or homolog thereof, in a test sample, using one of the antibodies 
of the present invention. Such methods involve incubating a test sample with one or more 
of the antibodies of the present invention and assaying for binding of the antibodies to 
components within the test sample. 

[0111] In a further embodiment, the present invention relates to a method for 
identifying pathogenic E. coli in an animal comprising analyzing tissue or body fluid from 
the animal for a nucleic acid, protein, polypeptide-antigen or antibody specific to one of 
the ORFs described in Tables 1-4 herein from E, coli J96 PAI IV or V. Analysis of 
nucleic acid specific to pathogenic E, coli can be by PGR techniques or hybridization 
techniques (cf. Molecular Cloning: A Laboratory Manual, second edition^ edited by 
Sambrook, Fritsch, & Maniatis, Cold Spring Harbor Laboratory, 1989; Eremeeva et al, J. 
Clin. Microbiol J2;803-810 (1994) which describes differentiation among spotted fever 
group Rickettsiae species by analysis of restriction fragment length polymorphism of 
PCR-ampHfiedDNA). 

[0112] Proteins or antibodies specific to pathogenic E, coli may be identified as 
described in Molecular Cloning: A Laboratory Manual, second edition, Sambrook et al, 
eds., Cold Spring Harbor Laboratory (1989). More specifically, antibodies may be raised 
to E. coll J96 PAI proteins as generally described in Antibodies: A Laboratory Manual^ 
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Harlow and Lane, eds.. Cold Spring Harbor Laboratory (1988). E, coli J96 PAI-specific 
antibodies can also be obtained from infected animals (Mather, T. et al^ JAMA 205:186- 
188(1994)). 

[0113] In another embodiment, the present invention relates to an antibody having 
binding affinity specifically to an £. coli J96 PAI antigen as described above. The E. coli 
J96 PAI antigens of the present invention can be used to produce antibodies or 
hybridomas. One skilled in the art will recognize that if an antibody is desired, a peptide 
can be generated as described herein and used as an immunogen. The antibodies of the 
present invention include monoclonal and polyclonal antibodies, as well as fragments of 
these antibodies. The invention further includes single chain antibodies. Antibody 
fragments which contain the idiotype of the molecule can be generated by known 
techniques, for example, such fragments include but are not limited to: the F(ab ) 
fragment; the Fab fragments, Fab fragments, and Fv fragments. 

[0114] Of special interest to the present invention are antibodies to pathogenic E. colt 
antigens which are produced in humans, or are "humanized" (i.e. non-immunogenic in a 
human) by recombinant or other technology. Humanized antibodies may be produced, for 
example by replacing an inmiunogenic portion of an antibody with a corresponding, but 
non-immunogenic portion (i.e. chimeric antibodies) (Robinson, R,R. et al. International 
Patent Publication PCT/US86/02269; Akira, K. etal, European Patent Application 
184,187; Taniguchi, M., European Patent Application 171,496; Morrison, S.L. et al.^ 
European Patent Application \13A9A\ Neuberger, M.S. etal, PCT Application WO 
86/01533; Cabilly, S. etal, European Patent Application 125,023; Better, M. etal, 
Science 2^0:1041-1043 (1988); Liu, A.Y. etal, Proc, Natl Acad. ScL USA 5^:3439-3443 
(1987); Liu, A.Y. etal, J Immunol 759:3521-3526 (1987); Sun, L.K. et aL Proc, Natl 
Acad Scl USA 84:214-218 (1987); Nishimura, Y. et aL Cane, Res. 47:999-1005 (1987); 
Wood, C.R. etal, Nature 37^:446-449 (1985)); Shaw a/., J. Natl Cancer Inst 80:1553- 
1559 (1988). General reviews of "humanized" chimeric antibodies are provided by 
Morrison, S.L. (Science, 229:1202-1207 (1985)) and by Oi, V.T. etal, BioTechniques 
4:214 (1986)). Suitable "humanized" antibodies can be alternatively produced by CDR or 
CEA substitution (Jones, P.T. et al. Nature 527:552-525 (1986); Verhoeyan et al, Science 
259:1534 (1988); Beidler, C.B. etal, J. Immunol 747:4053-4060 (1988)), 
[0115] In another embodiment, the present invention relates to a hybridoma which 
produces the above-described monoclonal antibody. A hybridoma is an immortalized cell 
line which is capable of secreting a specific monoclonal antibody. 

[0116] In general, techniques for preparing monoclonal antibodies and hybridomas are 
well known in the art (Campbell, ''Monoclonal Antibody Technology: Laboratory 
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Techniques in Biochemistry and Molecular Biology^^^ Elsevier Science Publishers, 
Amsterdam, The Netherlands (1984); St. Groth etaL, J. Immunol Methods 35:1-21 
(1980)). 

[0117] In another embodiment, the present invention relates to a method of detecting a 
pathogenic £ coH antigen in a sample, comprising: a) contacting the sample with an 
above-described antibody, under conditions such that immunocomplexes form, and b) 
detecting the presence of said antibody bound to the antigen. In detail, the methods 
comprise incubating a test sample with one or more of the antibodies of the present 
invention and assaying whether the antibody binds to the test sample. 
[0118] Conditions for incubating an antibody with a test sample vary. Incubation 
conditions depend on the format employed in the assay, the detection methods employed, 
and the type and nature of the antibody used in the assay. One skilled in the art will 
recognize that any one of the commonly available immunological assay formats (such as 
radioimmunoassays, enzyme-linked inmiunosorbent assays, diffusion based Ouchterlony, 
or rocket immunofluorescent assays) can readily be adapted to employ the antibodies of 
the present invention. Examples of such assays can be found in Chard, An Introduction to 
Radioimmunoassay and Related Techniques, Elsevier Science Publishers, Amsterdam, 
The Netherlands (1986); Bullock etal.^ Techniques in Immunocytochemistry, Academic 
Press, Orlando, FL Vol. 1 (1982), Vol. 2 (1983), Vol. 3 (1985); Tijssen, Practice and 
Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and Molecular 
Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1985); and 
Antibodies: A Laboratory Manual, Harlow and Lane, eds.. Cold Spring Harbor 
Laboratory (1988). 

[0119] The immunological assay test samples of the present invention include cells, 
protein or membrane extracts of cells, or biological fluids such as blood, serum, plasma, or 
urine. The test sample used in the above-described method will vary based on the assay 
format, nature of the detection method and the tissues, cells or extracts used as the sample 
to be assayed. Methods for preparing protein extracts or membrane extracts of cells are 
well known in the art and can be readily be adapted in order to obtain a sample which is 
capable with the system utilized. 

[0120] In another embodiment, the present invention relates to a method of detecting 
the presence of antibodies to pathogenic E. coli in a sample, comprising: a) contacting the 
sample with an above-described antigen, under conditions such that immunocomplexes 
form, and b) detecting the presence of said antigen bound to the antibody. In detail, the 
methods comprise incubating a test sample with one or more of the antigens of the present 
invention and assaying whether the antigen binds to the test sample. 
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[0121] In another embodiment of the present invention, a kit is provided which 
contains all the necessary reagents to carry out the previously described methods of 
detection. The kit may comprise: i) a first container means containing an above-described 
antibody, and ii) second container means containing a conjugate comprising a binding 
partner of the antibody and a label. In another preferred embodiment, the kit further 
comprises one or more other containers comprising one or more of the follov^ing: wash 
reagents and reagents capable of detecting the presence of bound antibodies. Examples of 
detection reagents include, but are not limited to, labeled secondary antibodies, or in the 
alternative, if the primary antibody is labeled, the chromophoric, enzymatic, or antibody 
binding reagents which are capable of reacting with the labeled antibody. The 
compartmentalized kit may be as described above for nucleic acid probe kits. 
[0122] One skilled in the art will readily recognize that the antibodies described in the 
present invention can readily be incorporated into one of the established kit formats which 
are well known in the art. 

Screening Assay for Binding Agents 

[0123] Using the isolated proteins described herein, the present invention further 
provides methods of obtaining and identifying agents that bind to a protein encoded by an 
E. coli J96 PAI ORF or to a fragment thereof 
[0124] The method involves: 

(a) contacting an agent with an isolated protein encoded by a E. coli J96 PAI 
ORF, or an isolated fragment thereof; and 

(b) determining whether the agent binds to said protein or said fragment. 
[0125] The agents screened in the above assay can be, but are not limited to, peptides, 
carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be 
selected and screened at random or rationally selected or designed using protein modeling 
techniques. For random screening, agents such as peptides^ carbohydrates, pharmaceutical 
agents and the like are selected at random and are assayed for their ability to bind to the 
protein encoded by an ORF of the present invention. 

[0126] Alternatively, agents may be rationally selected or designed- As used herein, an 
agent is said to be "rationally selected or designed" when the agent is chosen based on the 
configuration of the particular protein. For example, one skilled in the art can readily adapt 
currently available procedures to generate peptides, pharmaceutical agents and the like 
capable of binding to a specific peptide sequence in order to generate rationally designed 
antipeptide ligands, for example see Hurby etal^ Application of Synthetic Peptides: 
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Antisense Peptides, In Synthetic Peptides, A User's Guide, W.H. Freeman, NY (1992), pp. 
289-307, and Kaspczak et aL, Biochemistry 25:9230-8 (1989). 

[0127] In addition to the foregoing, one class of agents of the present invention, can be 
used to control gene expression through binding to one of the ORFs or EMFs of the 
present invention. As described above, such agents can be randomly screened or rationally 
designed and selected. Targeting the ORF or EMF allows a skilled artisan to design 
sequence specific or element specific agents, modulating the expression of either a single 
ORF or multiple ORFs that rely on the same EMF for expression control. 
[0128] One class of DNA binding agents are those that contain nucleotide base 
residues that hybridize or form a triple helix by binding to DNA or RNA. Such agents can 
be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of 
sulfhydryl or polymeric derivatives having base attachment capacity. 
[0129] Agents suitable for use in these methods usually contain 20 to 40 bases and are 
designed to be complementary to a region of the gene involved in transcription (triple 
helix - see Lee et al, Nucl Acids Res. 5:3073 (1979); Cooney et al^ Science 241:456 
(1988); and Dervan et al. Science 251: 1360 (1991)) or to the mRNA itself (antisense - 
Okario, J. Neurochem. 56:560 (1991): Oligodeoxynucleotides as Antisense Inhibitors of 
Gene Expression^ CRC Press, Boca Raton, FL (1988)). Triple helix-formation optimally 
results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization 
blocks translation of an mRNA molecule into polypeptide. Both techniques have been 
demonstrated to be effective in model systems. Information contained in the sequences of 
the present invention is necessary for the design of an antisense or triple helix 
oligonucleotide and other DNA binding agents. 

Computer Related Embodiments 

[0130] The nucleotide sequence provided in SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical to the 
sequences provided in SEQ ID NOs: 1 through 142, can be "provided" in a variety of 
media to facilitate use thereof. As used herein, "provided" refers to a manufacture, other 
than an isolated nucleic acid molecule, that contains a nucleotide sequence of the present 
invention, i.e., the nucleotide sequence provided in SEQ ID NOs: 1 through 142, a 
representative fragment thereof, or a nucleotide sequence at least 99.9% identical to SEQ 
ID NOs: 1 through 142. Such a manufacture provides the E. coli J96 PAI subgenomes or 
a subset thereof (e.g., one or more E. coli J96 PAI open reading frame (ORF)) in a form 
that allows a skilled artisan to examine the manufacture using means not directly 
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applicable to examining the E, coli J96 PAI subgenome or a subset thereof as it exists in 
nature or in purified form. 

[0131] In one application of this embodiment, one or more nucleotide sequences of the 
present invention can be recorded on computer readable media. As used herein, "computer 
readable media" refers to any medium that can be read and accessed directly by a 
computer. Such media include, but are not limited to: magnetic storage media^ such as 
floppy discSj hard disc storage medium, and magnetic tape; optical storage media such as 
CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these 
categories such as magnetic/optical storage media. A skilled artisan can readily appreciate 
how any of the presently known computer readable mediums can be used to create a 
manufacture comprising computer readable medium having recorded thereon a nucleotide 
sequence of the present invention. 

[0132] As used herein, "recorded" refers to a process for storing information on 
computer readable medium. A skilled artisan can readily adopt any of the presently know 
methods for recording information on computer readable medium to generate 
manufactures comprising the nucleotide sequence information of the present invention. A 
variety of data storage structures are available to a skilled artisan for creating a computer 
readable medium having recorded thereon a nucleotide sequence of the present invention. 
The choice of the data storage structure v^U generally be based on the means chosen to 
access the stored information. In addition, a variety of data processor programs and 
formats can be used to store the nucleotide sequence information of the present invention 
on computer readable medium. The sequence information can be represented in a word 
processing text file, formatted in commercially-available software such as WordPerfect 
and Microsoft Word, or represented in the form of an ASCII file, stored in a database 
application, such as DB2, Sybase, Oracle, or the like. A skilled artisan can readily adapt 
any number of dataprocessor structuring formats (e.g. text file or database) in order to 
obtain computer readable medium having recorded thereon the nucleotide sequence 
information of the present invention. 

[0133] By providing the nucleotide sequence of SEQ ID NOs: 1 through 142, 
representative fragments thereof, or nucleotide sequences at least 99.9% identical to SEQ 
ID NOs: 1 through 142, in computer readable form, a skilled artisan can routinely access 
the sequence information for a variety of purposes. Computer software is publicly 
available which allows a skilled artisan to access sequence information provided in a 
computer readable medium. The examples which follow demonstrate how software which 
implements the BLAST (Altschul et al, J. MoL Biol 275:403-410 (1990)) and BLAZE 
(Brutlag etal, Comp. Chem, 77:203-207 (1993)) search algorithms on a Sybase system 
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can be used to identify open reading frames (ORFs) within the E. coli J96 PAI subgenome 
that contain homology to ORFs or proteins from other organisms. Such ORPs are protein- 
encoding fragments within the E. coli J96 PAI subgenome and are useful in producing 
commercially important proteins such as enzymes used in modifying surface 0-antigens of 
bacteria. A comprehensive list of ORFs encoding commercially important E. coli J96 PAI 
proteins is provided in Tables 1 through 6. 

[0134] The present invention provides a DNA sequence - gene database of 
pathogenicity islands (PAIs) for E. coli involved in infectious diseases. This database is 
useful for identifying and characterizing the basic functions of new virulence genes for E. 
coli involved in uropathogenic and extraintestinal diseases. The database provides a 
number of novel open reading frames that can be selected for further study as described 
herein. 

[0135] Selectable insertion mutations in plasmid subclones encoding PAI genes with 
potentially significant phenotypes for E, coli uropathogenesis and sepsis can be isolated. 
The mutations are then crossed back into wild type, uropathogenic E, coli by homologous 
recombination to create wild-type strains specifically altered in the targeted gene. The 
significance of the genes to E. coli pathogenesis is assessed by in vitro assays and in vivo 
murine models of sepsis/peritonitis and ascending urinary tract infection. 
[0136] New virulence genes and PAI sites in uropathogenic E. coli may be identified 
by the transposon signature-tagged mutagenesis system and negative selection of E. coli 
mutants avirulent in murine models of ascending urinary tract infection or peritonitis. 
[0137] Epidemiological investigations of new virulence genes and PAIs may be used 
to test for their occurrence in the genomes of other pathogenic and opportimistic members 
of the Enterobacteriaceae. 

[0138] One can choose from the ORFs included in SEQ ID NOs: 1 through 142, using 
Tables 1 through 6 as a useful guidepost for selecting, as candidates for targeted 
mutagenesis, a limited number of candidate genes within the PAIs based on their 
homology to virulence, export or regulation genes in other pathogens. For the large 
number of apparent genes within the PAIs that do not share sequence similarity to any 
entries in the database, the transposon signature-tagged mutagenesis method developed by 
David Holden's laboratory can be employed as an independent means of virulence gene 
identification. 

[0139] Allelic knock-outs are constructed using different ;?/r-dependent suicide vectors 
(Swihart, K.A. and R.A. Welch, Infect Immun. 55:1853-1869 (1990)). In addition, two 
different animal model systems can be employed for assessment of pathogenic 
determinants. The initial identification of E. coli hemolysin as a virulence factor came 
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from the construction of isogenic E. coli strains that were tested in a rat model of intra- 
abdominal sepsis (Welch, R.A. et al. Nature (London) 294:665-661 (1981)). The 
ascending UTI (Urinary Tract Infection) mouse model was also successfully performed 
with allelic knock-outs of the hpmA hemolysin of Proteus mirabilis (Swihart, K.A. and 
R.A. Welch, Immun, 55:1853-1869 (1990)). 

[0140] The present invention further provides systems, particularly computer-based 
systems, which contain the sequence information described herein. Such systems are 
designed to identify commercially important fragments of the E. coli J96 PAI subgenome. 
As used herein, "a computer-based system" refers to the hardware means, software means, 
and data storage means used to analyze the nucleotide sequence information of the present 
invention. The minimum hardware means of the computer-based systems of the present 
invention comprises a central processing unit (CPU), input means, output means, and data 
storage means. A skilled artisan can readily appreciate that any one of the cxirrently 
available computer-based system are suitable for use in the present invention. 
[0141] As indicated above, the computer-based systems of the present invention 
comprise a data storage means having stored therein a nucleotide sequence of the present 
invention and the necessary hardware means and software means for supporting and 
implementing a search means. As used herein, **data storage means" refers to memory that 
can store nucleotide sequence information of the present invention, or a memory access 
means which can access manufactures having recorded thereon the nucleotide sequence 
information of the present invention. As used herein, "search means" refers to one or 
more programs which are implemented on the computer-based system to compare a target 
sequence or target structural motif with the sequence information stored within the data 
storage means. Search means are used to identify fragments or regions of the E. coli 
genome that match a particular target sequence or target motif A variety of known 
algorithms are disclosed publicly and a variety of commercially available software for 
conducting search means are available and can be used in the computer-based systems of 
the present invention. Examples of such software include, but are not limited to, 
MacPattem (EMBL), BLASTN and BLASTX (NCBIA). A skilled artisan can readily 
recognize that any one of the available algorithms or implementing software packages for 
conducting homology searches can be adapted for use in the present computer-based 
systems. 

[0142] As used herein, a "target sequence" can be any DNA or amino acid sequence of 
six or more nucleotides or two or more amino acids. A skilled artisan can readily 
recognize that the longer a target sequence is, the less likely a target sequence will be 
present as a random occurrence in the database. The most preferred sequence length of a 
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target sequence is from about 10 to 100 amino acids or from about 30 to 300 nucleotide 
residues. However, it is well recognized that during searches for commercially important 
fragments of the E. coli J96 PAI subgenome, such as sequence fragments involved in gene 
expression and protein processing, may be of shorter length, 

[0143] As used herein, "a target structural motif," or "target motif," refers to any 
rationally selected sequence or combination of sequences in which the sequence(s) are 
chosen based on a three-dimensional configuration which is formed upon the folding of 
the target motif. There are a variety of target motifs known in the art. Protein target 
motifs include, but are not limited to, enzymic active sites and signal sequences. Nucleic 
acid target motifs include, but are not limited to, promoter sequences, hairpin structures 
and inducible expression elements (protein binding sequences). 

[0144] Thus, the present invention further provides an input means for receiving a 
target sequence, a data storage means for storing the target sequence and the homologous 
E. coll J96 PAI sequence identified using a search means as described above, and an 
output means for outputting the identified homologous £. coli J96 PAI sequence. A 
variety of structural formats for the input and output means can be used to input and output 
information in the computer-based systems of the present invention. A preferred format 
for an output means ranks fragments of the E, coli J96 PAI subgenome possessing vaiying 
degrees of homology to the target sequence or target motif. Such presentation provides a 
skilled artisan with a ranking of sequences which contain various amounts of the target 
sequence or target motif and identifies the degree of homology contained in the identified 
fragment. 

[0145] A variety of comparing means can be used to compare a target sequence or 
target motif with the data storage means to identify sequence fragments of the E, coli J96 
PAI subgenomes. For example, implementing software which implement the BLAST and 
BLAZE algorithms (Altschul etal, J. Mol Biol 275:403-410 (1990)) can be used to 
identify open reading frames within the E. coli J96 PAI subgenome A skilled artisan can 
readily recognize that any one of the publicly available homology search programs can be 
used as the search means for the computer-based systems of the present invention. 
[0146] One application of this embodiment is provided in Figure 2. Figure 2 provides 
a block diagram of a computer system 102 that can be used to implement the present 
invention. The computer system 102 includes a processor 106 connected to a bus 104. 
Also connected to the bus 104 are a main memory 108 (preferably implemented as random 
access memory, RAM) and a variety of secondary storage devices 1 1 0, such as a hard 
drive 112 and a removable medivim storage device 114. The removable medium storage 
device 1 14 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic 
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tape drive, etc. A removable storage medium 1 16 (such as a floppy disk, a compact disk, a 
magnetic tape, etc.) containing control logic and/or data recorded therein may be inserted 
into the removable medium storage device 114. The computer system 102 includes 
appropriate software for reading the control logic and/or the data from the removable 
medium storage device 114 once inserted in the removable medium storage device 114. 
[0147] A nucleotide sequence of the present invention may be stored in a well known 
manner in the main memory 108, any of the secondary storage devices 110, and/or a 
removable storage medium 116. Software for accessing and processing the genomic 
sequence (such as search tools, comparing tools, etc.) reside in main memory 108 during 
execution. 

[0148] Having generally described the invention, the same will be more readily 
understood by reference to the following examples, which are provided by way of 
illustration and are not intended as limiting. 

Experimental 

Example 1: High Through-put Sequencing of Cosmid Clones Covering PAI IV and PAI 
VinKcoliJ96 

[0149] The complete DNA sequence of the pathogenicity islands, PAI IV and PAI V 
(respectively >170 kb and -110 kb), from tiropathogenic E. coli strain, J96 (04:K6) was 
determined using a strategy, cloning and sequencing method, data collection and assembly 
software essentially identical to those used by the TIGR group for determining the 
sequence of the Haemophilus influenzae genome (Fleischmann, R.D., ei aL^ Science 
269:496 (1995)). The sequences were then used for DNA and protein sequence similarity 
searches of the databases as described in Fleischmarm, Id. 

[0150] The analysis of the genetic information found within the PAIs of E. coli J96 
was facilitated by the use of overlapping cosmid clones possessing these unique segments 
of DNA. These cosmid clones were previously constructed and mapped (as further 
described below) as an overlapping set in the laboratory of Dr. Doug Berg (Washington 
University), A gap exists between the left portion of cosmid 2 and the end of the PAI IV 
that would represent the pheY junction to the E. coli K-12 genome. 

[0151] Uropathogenic strain E. coli J96 (04:K6) was used as a source of chromosomal 
DNA for construction of a cosmid library. E, coli K-12 DH5A and DH12 (Gibco/BRL, 
Gaithersburg, Md.) were used as hosts for maintaining cosmid and plasmid clones. The 
cosmid library ofE coli J96 DNA was constructed essentially as described by Bukanow & 
Berg (Mol. Microbiol 77:509-523 (1994)). DNA was digested with SauSAl under 
conditions that generated fragments with an average size of 40 to 50 kb and 
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electrophoresed through 1% agarose gels. Fragments of 35 to 50 kb were isolated and 
cloned into Lorist 6 vector that had been linearized with Bamlll and treated with bacterial 
alkaline phosphatase to block self-ligation. (Lorist 6 is a 5.2-kb moderate-copy-number 
cosmid vector with T7 and SP6 promoters close to the cloning site.) Cloned DNA was 
packaged in lambda phage particles in vitro by using a commercial kit (Amersham, 
Arlington Heights, IL) and cosmid-containing phage particles were used to transduce E. 
coli DH5a. Transductant colonies were transferred to 150 mL of Luria-Bertani broth 
supplemented with kanamycin in 96-well microtiter plates and grown overnight at 37°C 
with shaking. Two sets of clones, one for each PAI were ultimately assembled, as 
previously described (Swenson et aL, Infection and Immunity 64:3136-31^3 (1996)), fully 
incorporated by reference herein). 

[0152] The two sets of clones contain eleven sub-clones that were employed in the 
sequencing method described below. One set of four overlapping cosmid clones covers 
the /^r^'-containing PAI V, ATCC Deposit No. 97727, deposited September 23, 1996. A 
second set of seven subclones covers much of the /7^/7-containing PAI V, ATCC Deposit 
No. 97726, deposited September 23, 1996. See Figure L 

[0153] A high throughput, random sequencing method (Fleischmann et al, Science 
269:496 (1995); Fraser et aL, Science 270:391 (1995)) was used to obtain the sequences 
for 142 (contigs) fragments of E. coli J96 PAIs. All clones were sequenced from both 
ends to aid in the eventual ordering of contigs during the sequence assembly process. 
Briefly, random libraries of - 2 kb clones covering the two J96 PAIs were constructed, - 
2,800 clones were subjected to automated sequencing (- 450 nt/clone) and preliminary 
assemblies of the sequences accomplished which result in 142 contigs for each of the two 
PAIs that total 95 and 135 kb respectively. The estimated sizes of the PAI IV and PAI V 
based on the overlapping cosmid clones are 1.7 X 10^ and LI X 10^ bp respectively. 
The 142 sequences were assembled by means of the TIGR Assembler (Fleischmann et aL; 
Fraser et aL); Sutton et aL, Genome ScL Tech. 1:9 (1995)). Sequence and physical gaps 
were closed using a combination of strategies (Fleischmann et aL; Fraser et aL), Presently 
the average depth of sequencing for each base assembled in the contigs is 6-fold. The 
tentative identity of many genes based on sequence homology is covered in Tables 1, 3, 5 
and 6. 

[0154] Open reading frames (ORFs) and predicted protein-coding regions were 
identified as described (Fleischmann et aL; Fraser et aL) with some modification. In 
particular, the statistical prediction of uropathogenic E. coli J96 pathogenicity island genes 
was performed with GeneMark (Borodovsky, M. & Mclninch, J. Comput, Chem. 77:123 
(1993)). Regiilar GeneMark uses nonhomogeneous Markov models derived from a 
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training set of coding sequences and ordinary Markov models derived from a training set 
of noncoding sequences. The ORFs in Tables 1-6 were identified by GeneMark using a 
second-order Markov model trained from known E. coli coding regions and known E. coli 
non-coding regions. Among the important genes that are implicated in the virulence of 
E. coli J96 PAls are adhesins, excretion pathway proteins, proteins that participate in 
alterations of the O-antigen in the PAIs, cytotoxins, and two-component (membrane 
sensor/DNA binding) proteins. 


/. Adhesins. 

[0155] It is believed that the principal adhesin determinants involved in 
uropathogenicity that are present within PAIs of uropathogenic E. coli are the pili encoded 
by the pap-tQlatQA operons (Hultgren et al. Infect. Immun. 50:370-311 (1993), Stromberg 
et al, EMBO J 9:2001-2010 (1990), High et al, Infect. Immun. 56:513-517 (1988)) and 
the distantly related afimbrial adhesins (Labigne-Roussel et al, Infect. Immun. 46:251-259 
(1988)). The presence of two of these {pap, and prs) has been confirmed. In addition 
potential genes for five other adhesins including sla (described above), AIDA-I (diffuse 
adherence-DEAC), hra (heat resistant hemagglutinin-ETEC), fha (filamentous 
hemagglutinin- Bordetella pertussis) and the arg-gingipain proteinase of Porphyromonas 
gingivalis have been found. 

//. Type II exoprotein secretion pathway. 
[0156] Highly significant statistics support the presence of multiple genes involved in 
the type II exoprotein pathway. Curiously, perhaps two different determinants appear to 
be present in PAI IV where one set of genes has the highest sequence similarity to eps-like 
genes (Vibrio cholerae Ctx export) and the other has greatest similarity to exe genes 
(Aeromonas hydophilia aerolysin and protease export). At present, the assembly of 
contigs involving these potential genes is incomplete. Thus, it is uncertain if two separate 
and complete determinants are present. However, it is clear that these genes are newly 
discovered and novel to pathogenic E. coli because the derived sequences do not have 
either the bfp or hop genes as the highest matches. The gene products that are the target of 
the type II export pathway are not evident at this time. 

[0157] Within PAI IV there are sequences which suggest genes very similar to secD 
and secF. These two linked genes encode homologous products that are localized to the 
inner membrane and are hypothesized to play a late role in the translocation of leader- 
peptide containing proteins across the irmer membrane of gram-negative bacteria. In 
addition, in each PAI, sequences are found that are reminiscent of the heat-shock 
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htrA/degA gene that encodes a piroplasmic protease. They may perform endochaperone- 
like function as Pugsley et al. have hypothesized for different exoprotein pathways. 

///, O-antigen/capsule/carbohydrate modification (Nod genes). 
[0158] J96 has the 04. The O-antigen portion of lipopolysaccharide is encoded by rjb 
genes that are located at 45 min. on the E. coli chromosome. We have found in both PAIs 
a cumulative total of five possible r/&-like genes which could participate alterations of the 
O-antigen in the PAIs. Overall these data suggest that PAIs provide the genetic potential 
for greater change of the cell surface for uropathogenic E. coli strains than what was 
previously known, 

[0159] The apparent capsule type for strain J96 is a non-sialic acid K6-type. Sequence 
similarity "hits" were made in PAI IV region to two region- 1 capsule genes, kpsS and kpsE 
involved in the stabilization of polysaccharide synthesis and polysaccharide export across 
the inner membrane. This is not altogether surprising based on the genetic mapping of the 
kps locus to serA at 63 minutes on the genome of the Kl capsular type of E, coli. This 
suggests that these ^^-like genes either are participating in the K6-biosynthesis or perhaps 
are involved in complex carbohydrate export for other purposes, 

[0160] An intriguing discovery are the hits made on genes involved in bacteria-plant 
interactions by Rhizobium, Bradyrhizobium and Agrobacterium. Four potential genes 
identified thus far share significant sequence similarity to genes encoding products that 
modify lipo-oligosaccharides that influence nodule morphogenesis on legimie roots. 
These are: ORF140, carbamyl phosphate synthetase; modulation protein 1265; phosphate- 
regulatory protein; and an ORF at a plant-inducible locus in Agrobacterium, To date there 
are no descriptions in the literature of such gene products being utilized by human or 
animal bacterial pathogens for the purposes of modification or secretion of extracellular 
carbohydrate. However, the sequence similarity to the capsular region-2 genes and to 
lipooligosaccharide biosynthetic genes in Rhizobium spp has been recently noted by Petit 
(1995). 

/K Cytotoxins, 

[0161] Besides the previously known hemolysin and CNF toxins in the PAIs, in each 
PAI sequences similar to the shlBA operon (cosmid 5 and 12) were found for a cytolytic 
toxin from Serratia marcescens and Proteus mirabilis. Ironically, the P. mirabilis 
hemolysin (HpmA) member of this family of toxins was discovered by Uphoff and Welch 
(1990), but not thought to exist in other members of the Enterobacteriaceae (Swihart 
(1990)). A ^/z/jB-like transporter does also appear to be involved in the export of the 
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filamentous hemagglutinin of Bordetella pertussis which was described above and a cell 
surface adhesin of Haemophilus influenzae. It has been demonstrated that cosmid #5 of E. 
coll J96 encodes an extracellular protein that is -1 80 kDa and cross-reactive to polyclonal 
antisera to the P. mirabilis HpmA hemolysin. Thus, there is evidence suggesting there is 
new member of this family of proteins in extraintestinal E, coli isolates. In addition, there 
is also a hit on the FhaC hemolysin-like gene v^thin the PAI V although its statistical 
significance for the sequence thus far available is only 0.0043. 

K Regulators, 

[0162] A common regulatory motif in bacteria are the two-component (membrane 
sensor/DNA binding) proteins. In numerous instances in pathogenic bacteria, external 
signals in the environment cause membrane-bound protein kinases to phosphorylate a 
cytoplasmic protein which in turn acts as either a negative or positive effector of 
transcription of large sets of operons. On cosmid 1 1 representing PAI V were found, in 
Two different Pstl clones, sequences for two-component regulators (similar probabilities 
for OmpR/^ AIGB and separately RcsC, probabilities at the 10"22 level). 
[0163] In addition, the phosphoglycerate transport system ipgtA, pgtC^ and pgtP) 
including the pgtB regulator is present in PAI IV. This transport system which was 
originally described in 5. typhimurium is not appreciated as a component of any 
pathogenic E. coli genome. The operon had been previously mapped at 49 minutes near or 
within one of the S, typhimurium chromosome specific-loops not present in the K-12 
genome. It should be noted that the E. coli K-12 glpT gene product is similar to pgtP gene 
product (37% identity), but the E. coli J96 genes are clearly homologs to the pgt genes and 
their linkage within the middle of PAI IV element (cosmid #4) is suspicious. 

VL Mobile genetic elements, 
[0164] There are numerous sequences that share similarity to genes found on insertion 
elements, plasmids and phages. The temperate bacteriophage P4 inserts within tRNA loci 
in the E. coli chromosome. The hypothesis was made that PAIs are the result of 
bacteriophage P4-virulence gene recombination events (Blum et al, Infect. Immun. 
62:606-614 (1994). Data supporting this hypothesis was found during our sequencing 
with the identification of P4-like sequences in each of the PAIs (cosmids 7 and 9). This is 
a very important preliminary result which supports the hypothesis that PAIs can be 
identified by common sequence or genetic elements. However, there are indications that 
multiple mobile genetic elements involved in the evolution of the J96 PAIs. Conjugal 
plasmid-related sequences may also be present at two different locations (F factor and Rl 
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plasmid). Sequences for multiple transposable elements are present that are likely to have 
originated from different bacterial genera (TnlOOO, IS630, IS911, ISlOO, IS21, IS 1203, 
IS5376 (5. stearothermophflus) and RHS). Of particular interest is IS 100, which was 
originally identified in Yersinia pestis (Fetherston et ah, Mol Microbiol. <5:2693-2704 
(1992)).- The presence of IS106 is significant because it has been associated with the 
termini of a large chromosomal element encoding pigmentation and some aspect of 
virulence in K pestis. This element undergoes spontaneous deletions similar to the PAIs 
from E. coli 536 (Fetherston et al, Mol Microbiol 6:2693-2104 (1992)) and appears to 
participate in plasmid-chromosome rearrangements. This element was not previously 
known to be in genera outside of Yersinia. 

[0165] The discovery of the apparent att site for bacteriophage P2 in the PAIs is 
interesting. P2 acts as a helper phage for the P4 satellite phage. The P2 att site is at 44 
min in the K-12 genome. The significance of this hit is unknown at present, but may be 
explained as either a cloning artifact (some K-12 fragments in the Pst I library of cosmid 
5) or evidence of some curious chromosomal-P4/ P2 phage history. It may indicate that 
the J96 PAIs are composites of multiple smaller PAIs. 

Example 2: Preparation of PCR Primers and Amplification ofDNA 
[0166] Various fragments of the sequenced E. coli J96 PAIs, such as those disclosed in 
Tables 1 through 6 can be used, in accordance with the present invention, to prepare PCR 
primers. The PCR primers are preferably at least 15 bases, and more preferably at least 18 
bases in length. When selecting a primer sequence, it is preferred that the primer pairs 
have approximately the same G/C ratio, so that melting temperatures are approximately 
the same. The PCR primers are useful during PCR cloning of the ORFs described herein. 

Example 3: Gene expression from DNA Sequences Corresponding to ORFs 

[0167] A fragment of an E. coli J96 PAIs (preferably, a protein-encoding sequence 
provided in Tables 1 through 6) is introduced into an expression vector using conventional 
technology (techniques to transfer cloned sequences into expression vectors that direct 
protein translation in mammalian, yeast, insect or bacterial expression systems are well 
knovm in the art). Commercially available vectors and expression systems are available 
from a variety of suppliers including Stratagene (La JoUa, California), Promega (Madison, 
Wisconsin), and Invitrogen (San Diego, California). If desired, to enhance expression and 
facilitate proper protein folding, the codon context and codon pairing of the sequence may 
be optimized for the particular expression organism, as explained by Hatfield etal, U.S. 
Pat. No. 5,082,767, which is hereby incorporated by reference. 
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[0168] The following is provided as one exemplary method to generate polypeptide(s) 
from a cloned ORF of an E. coli J96 PAI whose sequence is provided in SEQ ID NOs: 1 
through 142. A poly A sequence can be added to the construct by, for example, splicing 
out the poly A sequence from pSG5 (Stratagene) using Bgll and Sail restriction 
endonuclease enzymes and incorporating it into the mammalian expression vector pXTl 
(Stratagene) for use in eukaryotic expression systems, pXTl contains the LTRs and a 
portion of the gag gene from Moloney Murine Leukemia Virus. The position of the LTRs 
in the construct allow efficient stable transfection. The vector includes the Herpes 
Simplex thymidine kinase promoter and the selectable neomycin gene. The E. coli J96 
PAI DNA is obtained by PGR from the bacterial vector using oligonucleotide primers 
complementary to the E, coli J96 PAI DNA and containing restriction endonuclease 
sequences for PstI incorporated into the 5 primer and Bglll at the 5 end of the 
corresponding E. coli J96 PAI DNA 3 primer, taking care to ensure that the E. coli J96 
PAI DNA is positioned such that its followed with the poly A sequence. The purified 
fragment obtained from the resulting PGR reaction is digested with PstI, blunt ended with 
an exonuciease, digested with Bglll, purified and ligated to pXTl, now containing a poly 
A sequence and digested BgllL 

[0169] The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin 
(Life Technologies, Inc., Grand Island, New York) under conditions outlined in the 
product specification. Positive transfectants are selected after growing the transfected 
cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri). The protein is preferably released 
into the supernatant. However if the protein has membrane binding domains, the protein 
may additionally be retained within the cell or expression may be restricted to the cell 
surface. 

[0170] Since it may be necessary to purify and locate the transfected product, synthetic 
15-mer peptides synthesized from the predicted E, coli J96 PAI DNA sequence are 
injected into mice to generate antibody to the polypeptide encoded by the E. coli J96 PAI 
DNA. 

[0171] If antibody production is not possible, the E, coli J96 PAI DNA sequence is 
additionally incorporated into eukaryotic expression vectors and expressed as a chimeric 
with, for example, fi-globin. Antibody to fi-globin is used to purify the chimeric. 
Corresponding protease cleavage sites engineered between the B-globin gene and the E. 
coli J96 PAI DNA are then used to separate the two polypeptide fragments from one 
another after translation. One useful expression vector for generating B-globin chimerics 
is pSG5 (Stratagene). This vector encodes rabbit 6-globin. Intron II of the rabbit B-globin 
gene facilitates splicing of the expressed transcript, and the polyadenylation signal 
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incorporated into the construct increases the level of expression. These techniques as 
described are well known to those skilled in the art of molecular biology. Standard 
methods are available from the technical assistance representatives from Stratagene, Life 
Technologies, Inc., or Promega. Polypeptides may additionally be produced from either 
construct using in vitro translation systems such as In vitro ExpressTM Translation Kit 
(Stratagene). 

Example 4: £1 coli Expression of an E. coli J96 PAI ORF and protein purification 

[0172] An E. coli J96 PAI ORF described in Tables 1 through 6 is selected and 

amplified using PGR oligonucleotide primers designed from the nucleotide sequences 

flanking the selected ORF and/or from portions of the ORF s NH - or COOH-terminus. 

Additional nucleotides containing restriction sites to facilitate cloning are added to the 5' 

2 

and y sequences, respectively. 

[0173] The restriction sites are selected to be convenient to restriction sites in the 
bacterial expression vector pQE60. The bacterial expression vector pQE60 is used for 
bacterial expression in this example. (QIAGEN, Inc., 9259 Eton Avenue, Chatsworth, CA, 
91311). pQE60 encodes ampicillin antibiotic resistance ("Ampr'*) and contains a bacterial 
origin of replication ("ori"), an IPTG inducible promoter, a ribosome binding site ("RBS"), 
six codons encoding histidine residues that allow affinity purification using nickel-nitrilo- 
tri-acetic acid ("Ni-NTA") affinity resin sold by QIAGEN, Inc., supra, and suitable single 
restriction enzyme cleavage sites. These elements are arranged such that a DNA fragment 
encoding a polypeptide may be inserted in such as way as to produce that polypeptide with 
the six His residues (i.e., a "6 X His tag") covalently linked to the carboxyl terminus of 
that polypeptide. 

[0174] The DNA sequence encoding the desired portion of an E. coli J96 PAI is 
amplified from the deposited cDNA clone using PGR oligonucleotide primers which 
anneal to the amino terminal sequences of the desired portion of the E, coli protein and to 
sequences in the deposited construct 3' to the cDNA coding sequence. Additional 
nucleotides containing restriction sites to facilitate cloning in the pQE60 vector are added 
to the 5' and 3' sequences, respectively. 

[0175] The amplified E, coli J96 PAI DNA fragments and the vector pQE60 are 
digested with one or more appropriate restriction enzymes, such as Sail and Xbal, and the 
digested DNAs are then ligated together. Insertion of the E. coli J96 PAI DNA into the 
restricted pQE60 vector places the E. coli J96 PAI protein coding region, including its 
associated stop codon, downstream from the IPTG-inducible promoter and in-frame with 
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an initiating AUG. The associated stop codon prevents translation of the six histidine 
codons downstream of the insertion point. 

[0176] The ligation mixture is transformed into competent E. coli cells using standard 
procedures such as those described in Sambrook et al, Molecular Cloning: a Laboratory 
Manual, 2nd Ed.; Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY (1989). 
E, coli strain M15/rep4, containing multiple copies of the plasmid pREP4, which expresses 
the lac repressor and confers kanamycin resistance ("Kanr"), is used in carrying out the 
illustrative example described herein. This strain, which is only one of many that are 
suitable for expressing an E, coli J96 PAI protein, is available commercially from 
QIAGEN, Inc., supra. Transformants are identified by their ability to grow on LB plates 
in the presence of ampicillin and kanamycin. Plasmid DNA is isolated from resistant 
colonies and the identity of the cloned DNA confirmed by restriction analysis, PCR and 
DNA sequencing. 

[0177] Clones containing the desired constructs are grown overnight ("O/N") in liquid 
culture in LB media supplemented with both ampicillin (100 |Lig/ml) and kanamycin (25 
)ig/ml). The O/N culture is used to inoculate a large culture, at a dilution of approximately 
1:25 to 1:250. The cells are grown to an optical density at 600 nm ("OD600") of between 
0.4 and 0.6. isopropyl-P-D-thiogalactopyranoside ("IPTG") is then added to a final 
concentration of 1 mM to induce transcription from the lac repressor sensitive promoter, 
by inactivating the laci repressor. Cells subsequently are incubated further for 3 to 4 
hours. Cells then are harvested by centrifLigation. 

[0178] The cells are then stirred for 3-4 hours at 4^C in 6M guanidine-HCl, pH8. The 
cell debris is removed by centrifugation, and the supernatant containing the E. coli J96 
PAI protein is dialyzed against 50 mM Na-acetate buffer pH6, supplemented with 200 mM 
NaCl. Alternatively, the protein can be successfully refolded by dialyzing it against 500 
mM NaCI, 20% glycerol, 25 mM Tris/HCl pH7.4, containing protease inhibitors. After 
renaturation the protein can be purified by ion exchange, hydrophobic interaction and size 
exclusion chromatography. Alternatively, an affinity chromatography step such as an 
antibody column can be used to obtain pure E, coli J96 PAI protein. The purified protein 
is stored at 4'^C or frozen at -SO'^C. 

Example 5: Cloning and Expression of an E. coli J96 PAI protein in a Baculovirus 
Expression System 

[0179] An E. coli J96 PAI ORF described in Tables 1 through 6 is selected and 
amplified as above. The plasmid is digested with appropriate restriction enzymes and 
optionally, can be dephosphorylated using calf intestinal phosphatase, using routine 
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procedures known in the art. The DNA is then isolated from a 1% agarose gel using a 
commercially available kit ("Geneclean" BIO 101 Inc., La Jolla, Ca.). This vector DNA is 
designated herein "VI". 

[0180] Fragment Fl and the dephosphorylated plasmid VI are ligated together with T4 
DNA ligase. E. coli HBlOl or other suitable E. coli hosts such as XL-1 Blue (Stratagene 
Cloning Systems, La Jolla, CA) cells are transformed with the ligation mixture and spread 
on culture plates. Bacteria are identified that contain the plasmid with the E. coli J96 PAI 
gene by digesting DNA from individual colonies using appropriate restriction enzymes 
and then analyzing the digestion product by gel electrophoresis. The sequence of the 
cloned fragment is confirmed by DNA sequencing. This plasmid is designated herein 
pBac E. coli J96. 

[0181] Five |ig of the plasmid pBac E. coli J96 is co-transfected with 1.0 jig of a 
commercially available linearized baculovirus DNA ("BaculoGold baculovirus DNA", 
Pharmingen, San Diego, CA.), using the lipofection method described by Feigner et al, 
Proa Natl. Acad Sci USA 84:1A\3'1A\1 (1987). 1 ^g of BaculoGold virus DNA and 5 
)ig of the plasmid pBac £. coli J96 are mixed in a sterile well of a microliter plate 
containing 50 \x\ of serum-free Grace's medium (Life Technologies Inc., Gaithersburg, 
MD). Afterwards, 10 |il Lipofectin plus 90 |il Grace's medium are added, mixed and 
incubated for 15 minutes at room temperature. Then the transfection mixture is added 
drop-wise to Sf9 insect cells (ATCC CRL 1711) seeded in a 35 mm tissue culture plate 
with 1 ml Grace's medium without serum. The plate is rocked back and forth to mix the 
newly added solution. The plate is then incubated for 5 hours at 27°C. After 5 hours the 
transfection solution is removed from the plate and 1 ml of Grace's insect medium 
supplemented with 1 0% fetal calf serum is added. The plate is put back into an incubator 
and cultivation is continued at 27°C for four days. 

[0182] After four days the supernatant is collected and a plaque assay is. performed, as 
described by Summers and Smith, supra. An agarose gel with "Blue Gal" (Life 
Technologies Inc.) is used to allow easy identification and isolation of gal-expressing 
clones, which produce blue-stained plaques. (A detailed description of a "plaque assay" of 
this type can also be found in the user's guide for insect cell culture and baculovirology 
distributed by Life Technologies Inc., page 9-10). After appropriate incubation, blue 
stained plaques are picked with the tip of a micropipettor (e.g., Eppendorf). The agar 
containing the recombinant viruses is then resuspended in a microcentrifiige tube 
containing 200 |il of Grace's medium and the suspension containing the recombinant 
baculovirus is used to infect Sf9 cells seeded in 35 mm dishes. Four days later the 
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supematants of these culture dishes are harvested and then they are stored at The 
recombinant virus is called Y-E. coli J96. 

[0183] To verify the expression of the E. coli gene Sf9 cells are grown in Grace's 
medium supplemented with 10% heat inactivated FBS. The cells are infected with the 
recombinant baculovirus V-jE, coli J96 at a multiplicity of infection ("MOI") of about 2. 
Six hours later the medium is removed and is replaced with SF900 II medium minus 
methionine and cysteine (available from Life Technologies Inc.). If radiolabeled proteins 
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are desired, 42 hours later, 5 \xC\ of S-methionine and 5 \xC\ S-cysteine (available from 
Amersham) are added. The cells are further incubated for 16 hours and then they are 
harvested by centrifugation. The proteins in the supernatant as well as the intracellular 
proteins are analyzed by SDS-PAGE followed by autoradiography (if radiolabeled). 
Microsequencing of the amino acid sequence of the amino terminus of purified protein 
may be used to determine the amino terminal sequence of the mature protein and thus the 
cleavage point and length of the secretary signal peptide. 

Example 6: Cloning and Expression in Mammalian Cells 

[0184] Most of the vectors used for the transient expression of an coli J96 PAI gene 
in mammalian cells should carry die SV40 origin of replication. This allov/s the 
replication of the vector to high copy numbers in cells (e.g., COS cells) which express the 
T antigen required for the initiation of viral DNA synthesis. Any other mammalian cell 
line can also be utilized for this purpose. 

[0185] A typical mammalian expression vector contains the promoter element, which 
mediates the initiation of transcription of mRNA, the protein coding sequence, and signals 
required for the termination of transcription and polyadenylation of the transcript. 
Additional elements include enhancers, Kozak sequences and intervening sequences 
flanked by donor and acceptor sites for RNA splicing. Highly efficient transcription can 
be achieved with the early and late promoters from SV40, the long terminal repeats 
(LTRS) from Retroviruses, e.g., RSV, IHTLVI, HI VI and the early promoter of the 
cytomegalovirus (CMV). However, cellular elements can also be used (e.g., the human 
actin promoter). Suitable expression vectors for use in practicing the present invention 
include, for example, vectors such as PSVL and PMSG (Pharmacia, Uppsala, Sweden), 
pRSVcat (ATCC 37152), pSV2dhfr (ATCC 37146) and pBC12MI (ATCC 67109), 
Mammalian host cells that could be used include, human Hela, 293, H9 and Jurkat cells, 
mouse NIH3T3 and CI 27 cells, Cos 1, Cos 7 and CV I, quail QCl-3 cells, mouse L cells 
and Chinese hamster ovary (CHO) cells. 
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[0186] Alternatively, the gene can be expressed in stable cell lines that contain the 
gene integrated into a chromosome. The co-transfection with a selectable marker such as 
dhfr, gpt, neomycin, hygromycin allows the identification and isolation of the transfected 
cells. 

[0187] The transfected gene can also be amplified to express large amounts of the 
encoded protein. The DHFR (dihydrofolate reductase) marker is useful to develop cell 
lines that carry several hundred or even several thousand copies of the gene of interest. 
Another useful selection marker is the enzyme glutamine synthase (GS) (Murphy et al, 
Biochem J. 227:277-279 (199 1); Bebbington et al, Bio/Technology 70:169-175 (1992)). 
Using these markers, the mammalian cells are grown in selective medium and the cells 
with the highest resistance are selected. These cell lines contain the amplified gene(s) 
integrated into a chromosome. Chinese hamster ovary (CHO) and NSO cells are often 
used for the production of proteins. 

[0188] The expression vectors pCl and pC4 contain the strong promoter (LTR) of the 
Rous Sarcoma Virus (CuUen et al, Molecular and Cellular Biology, 438447 (March, 
1985)) plus a fragment of the CMV-enhancer (Boshart et al. Cell 47:521-530 (1985)). 
Multiple cloning sites, e.g., with the restriction enzyme cleavage sites BamHI, Xbal and 
Asp718, facilitate the cloning of the gene of interest. The vectors contain in addition the 3' 
intron, the polyadenylation and termination signal of the rat preproinsuUn gene. 

Example 6(a): Cloning and Expression in COS Cells 

[0189] The expression plasmid, p E, coli J96HA, is made by cloning a cDNA 
encoding E. coli J96 PAI protein into the expression vector pcDNAI/Amp or pcDNAIII 
(which can be obtained from Invitrogen, Inc.). 

[0190] The expression vector pcDNAI/amp contains: (1) an E, coli origin of 
replication effective for propagation in E. coli and other prokaryotic cells; (2) an 
ampicillin resistance gene for selection of plasmid-containing prokaryotic cells; (3) an 
SV40 origin of replication for propagation in eukaryotic cells; (4) a CMV promoter, a 
polylinker, an SV40 intron; (5) several codons encoding a hemagglutinin fragment (i.e., an 
"HA" tag to facilitate purification) followed by a termination codon and polyadenylation 
signal arranged so that a cDNA can be conveniently placed under expression control of the 
CMV promoter and operably linked to the SV40 intron and the polyadenylation signal by 
means of restriction sites in the polylinker. The HA tag corresponds to an epitope derived 
from the influenza hemagglutinin protein described by Wilson et al, Cell 37:161 (1984). 
The fusion of the HA tag to the target protein allows easy detection and recovery of the 
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recombinant protein with an antibody that recognizes the HA epitope. pcDNAIII contains, 
in addition, the selectable neomycin marker. 

[0191] A DNA fragment encoding the E. coli J96 PAI protein is cloned into the 
polylinker region of the vector so that recombinant protein expression is directed by the 
CMV promoter. The plasmid construction strategy is as follows. The E. coli cDNA of the 
deposited clone is amplified using primers that contain convenient restriction sites, much 
as described above for construction of vectors for expression of E. coli J96 PAI protein in 
E, coli. 

[0192] The PCR amplified DNA fragment and the vector, pcDNAI/Amp, are digested 
with appropriate restriction enzymes for the chosen primer sequences and then ligated. 
The ligation mixture is transformed into E. coli strain SURE (available from Stratagene 
Cloning Systems, La Jolla, CA 92037), and the transformed culture is plated on ampicillin 
media plates which then are incubated to allow growth of ampicillin resistant colonies. 
Plasmid DNA is isolated from resistant colonies and examined by restriction analysis or 
other means for the presence of the E. coli J96 PAI protein-encoding fragment. 
[0193] For expression of recombinant E, coli J96 PAI protein, COS cells are 
transfected with an expression vector, as described above, using DEAE-DEXTRAN, as 
described, for instance, in Sambrook et al., Molecular Cloning: a Laboratory Manual 
Cold Spring Laboratory Press, Cold Spring Harbor, New York (1989). Cells are incubated 
under conditions for expression of E. coli J96 PAI protein by the vector. 
[0194] Expression of the E. coli J96 PAI - HA fiision protein is detected by 
radiolabeling and immunoprecipitation, using methods described in, for example Harlow 
et al, Antibodies: A Laboratory Manual 2nd Ed; Cold Spring Harbor Laboratory Press, 
Cold Spring Harbor, New York (1988). To this end, two days after transfection, the cells 
are labeled by incubation in media containing ^Sg-cysteine for 8 hours. The cells and the 
media are collected, and the cells are washed and the lysed with detergent-containing 
RIPA buffer: ISOmMNaCl, l%NP-40, 0.1% SDS, l%NP-40, 0.5% DOC, SOmMTRIS, 
pH 7.5, as described by Wilson et al cited above. Proteins are precipitated from the cell 
lysate and from the culture media using an HA-specific monoclonal antibody. The 
precipitated proteins then are analyzed by SDS-PAGE and autoradiography. An 
expression product of the expected size is seen in the cell lysate, which is not seen in 
negative controls. 

Example 6(b): Cloning and Expression in CHO Cells 

[0195] The vector pC4 is used for the expression of an E, coli J96 PAI protein. 
Plasmid pC4 is a derivative of the plasmid pSV2-dhfr (ATCC Acc. No. 37146). The 
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plasmid contains the mouse DHFR gene under control of the SV40 early promoter, 
Chinese hamster ovary- or other cells lacking dihydrofolate activity that are transfected 
with these plasmids can be selected by growing the cells in a selective medium (alpha 
minus MEM, Life Technologies, Inc.) supplemented with the chemotherapeutic agent 
methotrexate. The amplification of the DHFR genes in cells resistant to methotrexate 
(MTX) has been well documented (see, e.g., Alt, F. W. et al^ 1978, J. BioL Chem. 
253:1357-1370, Hamlin, J. L. and Ma, C. 1990, Biochim. et Biophys, Acta, 7097:107-143, 
Page, M. J. and Sydenham, M.A. 1991, Biotechnology P:64-68). Cells grown in 
increasing concentrations of MTX develop resistance to the drug by overproducing the 
target enzyme, DHFR, as a result of amplification of the DHFR gene. If a second gene is 
linked to the DHFR gene, it is usually co-amplified and over-expressed. It is known in the 
art that this approach may be used to develop cell lines carrying more than 1 ,000 copies of 
the amplified gene(s). Subsequently, when the methotrexate is withdrawn, cell lines are 
obtained which contain the amplified gene integrated into one or more chromosome(s) of 
the host cell. 

}0196] Plasmid pC4 contains for expressing the gene of interest the strong promoter of 
the long terminal repeat (LTR) of the Rouse Sarcoma Virus (CuUen, et al. Molecular and 
Cellular Biology, March 1985:438-447) plus a fragment isolated from the enliancer of the 
immediate early gene of human cytomegalovirus (CMV) (Boshart et al. Cell ^7:521-530 
(1985)). Downstream of the promoter is BamHI restriction enzyme site that allows the 
integration of the gene. Behind these cloning sites the plasmid contains the 3' intron and 
polyadenylation site of the rat preproinsulin gene. Other high efficiency promoters can 
also be used for the expression, e.g., the human -actin promoter, the SV40 early or late 
promoters or the long terminal repeats firom other retroviruses, e.g., HIV and HTLVI. 
ClontecWs Tet-Off and Tet-On gene expression systems and similar systems can be used 
to express the E, coli protein in a regulated way in mammalian cells (Gossen, M., & 
Bujard, H. 1992, Proc. Natl Acad. Sci. USA 89: 5547-5551). For the polyadenylation of 
the mRNA other signals, e.g., firom the human growth hormone or globin genes can be 
used as well. Stable cell lines carrying a gene of interest integrated into the chromosomes 
can also be selected upon co-transfection with a selectable marker such as gpt, G418 or 
hygromycin. It is advantageous to use more than one selectable marker in the beginning, 
e.g., G418 plus methotrexate. 

[0197] The plasmid pC4 is digested with appropriate restriction enzymes and then 
dephosphorylated using calf intestinal phosphates by procedures known in the art. The 
vector is then isolated from a 1% agarose gel. 
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[0198] The DNA sequence encoding the complete E. coli J96 PAI protein including its 
leader sequence is amplified using PGR oligonucleotide primers corresponding to the 5' 
and 3' sequences of the gene. 

[0199] The amplified fragment is digested with appropriate endonucl eases for the 
chosen primers and then purified again on a 1% agarose gel. The isolated fragment and 
the dephosphorylated vector are then ligated with T4 DNA ligase. E. coli HBlOl or XL-1 
Blue cells are then transformed and bacteria are identified that contain the fragment 
inserted into plasmid pC4 using, for instance, restriction enzyme analysis. 
[0200] Chinese hamster ovary cells lacking an active DHFR gene are used for 
transfection. 5 |ig of the expression plasmid pC4 is cotransfected with 0,5 ^ig of the 
plasmid pSVneo using lipofectin (Feigner et al, supra). The plasmid pSV2neo contains a 
dominant selectable marker, the neo gene from Tn5 encoding an enzyme that confers 
resistance to a group of antibiotics including G418. The cells are seeded in alpha minus 
MEM supplemented with 1 mg/ml G418. After 2 days, the cells are trypsinized and 
seeded in hybridoma cloning plates (Greiner, Germany) in alpha minus MEM 
supplemented with 10, 25, or 50 ng/ml of methothrexate plus 1 mg/'ml G418. Aft^er about 
50-14 days single clones are trypsinized and then seeded in 6-well petri dishes or 10 ml 
flasks using different concentrations of methotrexate (50 nM, 100 nM, 200 nM, 400 nm, 
800 nM). Clones growing at the highest concentrations of methotiexate are then 
transferred to new 6-well plates containing even higher concentrations of methotrexate 
(1 |aM, 2 laM, 5 |aM, 10 mM, 20 mM). The same procedure is repeated until clones are 
obtained which grow at a concentration of 1 00 - 200 jiM. Expression of the desired gene 
product is analyzed, for instance, by SDS-PAGE and Western blot or by reversed phase 
HPLC analysis. 

Example 7: Production of an Antibody to an E. coli J96 Pathogenicity Island Protein 

[0201] Substantially pure E, coli J96 PAI protein or polypeptide is isolated from the 
transfected or transformed cells described above using an art-known method. The protein 
can also be chemically synthesized. Concentration of protein in the final preparation is 
adjusted, for example, by concentration on an Amicon filter device, to the level of a few 
micrograms/mL Monoclonal or polyclonal antibody to the protein can then be prepared as 
follows: 

/. Monoclonal Antibody Production by Hybridoma Fusion 

[0202] Monoclonal antibody to epitopes of any of the peptides identified and isolated 
as described can be prepared from murine hybridomas according to the classical method of 
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Kohler and Milstein, Nature 256:495 (1975) or modifications of the methods thereof. 
Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein 
over a period of a few weeks. The mouse is then sacrificed, and the antibody producing 
cells of the spleen isolated. The spleen cells are fused by means of polyethylene glycol 
with mouse myeloma cells, and the excess unfused cells destroyed by grov^h of the 
system on selective media comprising aminopterin (HAT media). The successfully fiised 
cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where 
growth of the culture is continued. Antibody-producing clones are identified by detection 
of antibody in the supernatant fluid of the wells by immunoassay procedures, such as 
ELISA, as originally described by Engvall, E., Meth EnzymoL 70:419 (1980), and 
modified methods thereof Selected positive clones can be expanded and their monoclonal 
antibody product harvested for use. Detailed procedures for monoclonal antibody 
production are described in Davis, L. et al Basic Methods in Molecular Biology Elsevier, 
New York. Section 21-2 (1989). 

//. Polyclonal Antibody Production by Immunization 
[0203] Polyclonal antiserum containing antibodies to heterogenous epitopes of a single 
protein can be prepared by immunizing suitable animals with the expressed protein 
described above, which can be uimiodified or modified to enhance irrmiunogenicity. 
Effective polyclonal antibody production is affected by many factors related both to the 
antigen and the host species. For example, small molecules tend to be less immunogenic 
than other molecules and may require the use of carriers and adjuvant. Also, host animals 
vary in response to site of inoculations and dose, with both inadequate or excessive doses 
of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at 
multiple intradermal sites appears to be most reliable. An effective immimization protocol 
for rabbits can be found in Vaitukaitis, J. etal, J, Clin. Endocrinol Metab. 55:988-991 
(1971). 

[0204] Booster injections can be given at regular intervals, and antiserum harvested 
when antibody titer thereof, as determined semi-quantitatively, for 

[0205] example, by double immunodiffusion in agar against knovm concentrations of 
the antigen, begins to fall {See Ouchterlony, O. et al. Chap. 19 in: Handbook of 
Experimental Immunology, Wier, D., ed, Blackwell (1973)). Plateau concentration of 
antibody is usually in the range of 0.1 to 0.2 mg/ml of serum (about 12 ^lM). Affinity of 
the antisera for the antigen is determined by preparing competitive binding curves, as 
described, for example, by Fisher, D., Chap. 42 in: Manual of Clinical Immunology, 2nd 
ed.. Rose and Friedman, (eds.), Amer. Soc. For Microbio., Washington, D.C. (1980). 
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[0206] Antibody preparations prepared according to either protocol are useful in 
quantitative immunoassays which determine concentrations of antigen-bearing substances 
in biological samples; they are also used semi-quantitatively or qualitatively to identify the 
presence of antigen in a biological sample. 

[0207] While the present invention has been described in some detail for purposes of 
clarity and imderstanding, one skilled in the art will appreciate that various changes in 
form and detail can be made without departing from the true scope of the invention. 
[0208] All patents, patent applications and publications recited herein are hereby 
incorporated by reference. 
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